Re: Terminating slashes in URL normalization

2006-08-05 Thread Chris Schneider
Jukka, >On 8/5/06, Chris Schneider <[EMAIL PROTECTED]> wrote: >>Given this, shouldn't the default URL normalizer just add a slash to >>the end of a URL that doesn't have a file extension? At 8:41 AM +0300 8/5/06, Jukka Zitting wrote: >Section 6.2.4 of RFC 3986 suggests that a crawler could do suc

Re: Terminating slashes in URL normalization

2006-08-04 Thread Jukka Zitting
Hi, On 8/5/06, Jukka Zitting <[EMAIL PROTECTED]> wrote: Section 6.2.4 of RFC 3986 suggests that a crawler could do such a normalization if it detects that http://mail.python.org/mailman/listinfo redirects to http://mail.python.org/mailman/listinfo/. Which it of course doesn't... :-) Another re

Re: Terminating slashes in URL normalization

2006-08-04 Thread Sami Siren
Chris Schneider wrote: Gang, Pardon my ignorance, but I noticed recently that some URLs were duplicated in my crawldb, once with a terminating slash and once without it. For example, both of the following URLs were found in the same crawldb: http://mail.python.org/mailman/listinfo/ http://ma

Re: Terminating slashes in URL normalization

2006-08-04 Thread Jukka Zitting
Hi, On 8/5/06, Chris Schneider <[EMAIL PROTECTED]> wrote: Given this, shouldn't the default URL normalizer just add a slash to the end of a URL that doesn't have a file extension? Section 6.2.4 of RFC 3986 suggests that a crawler could do such a normalization if it detects that http://mail.pyt

Terminating slashes in URL normalization

2006-08-04 Thread Chris Schneider
Gang, Pardon my ignorance, but I noticed recently that some URLs were duplicated in my crawldb, once with a terminating slash and once without it. For example, both of the following URLs were found in the same crawldb: http://mail.python.org/mailman/listinfo/ http://mail.python.org/mailman/l