Re: Normalizing URLs with anchors

2006-01-06 Thread Doug Cutting
Ken Krugler wrote: I'm wondering whether it would also make sense to remove anchor text from URLs. For example, currently these two URLs are treated as different: http://www.dina.kvl.dk/~sestoft/gcsharp/index.html#wordindex and http://www.dina.kvl.dk/~sestoft/gcsharp/index.html Is it

Normalizing URLs with anchors

2006-01-05 Thread Ken Krugler
Hi all, The default regex-normalize.xml currently strips out PHP session ids. I'm wondering whether it would also make sense to remove anchor text from URLs. For example, currently these two URLs are treated as different:

Re: Normalizing URLs with anchors

2006-01-05 Thread ogjunk-nutch
@lucene.apache.org Sent: Thu 05 Jan 2006 04:40:07 PM EST Subject: Normalizing URLs with anchors Hi all, The default regex-normalize.xml currently strips out PHP session ids. I'm wondering whether it would also make sense to remove anchor text from URLs. For example, currently these two URLs