Bugs item #988325, was opened at 2004-07-09 18:41
Message generated for change (Comment added) made by cutting
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548

Category: None
Group: None
>Status: Open
>Resolution: None
Priority: 2
Submitted By: Fabio Gasparetti (magnum74)
Assigned to: Nobody/Anonymous (nobody)
Summary: case insensitive hostname

Initial Comment:
The RegexURLFilter does not consider case insensitive
hostnames. If you have two links in your site:
mysite.net/ and MySite.net/, you need to specify
something like [Mm][Yy][Ss]... in the urlfilter.txt
file to catch both of them.
Perhaps just a simple remainder in the accept host
comment would be useful.



----------------------------------------------------------------------

>Comment By: Doug Cutting (cutting)
Date: 2004-07-10 13:08

Message:
Logged In: YES 
user_id=21778

You're right, I spoke too soon.  The URL has not yet been
normalized at this point.  I think the best fix is to
normalize the link in the Outlink constructor, as is done in
Page.java.

----------------------------------------------------------------------

Comment By: Fabio Gasparetti (magnum74)
Date: 2004-07-10 10:38

Message:
Logged In: YES 
user_id=666942

Yeah, but as far as I see in the source I guess that 
the normalization happens in the Link constructor,
when the url has been already filtered by the call:

URLFilterFactory.getFilter().filter(url);

in the UpdateDatabaseTool.pageContentsChanged() method.


----------------------------------------------------------------------

Comment By: Doug Cutting (cutting)
Date: 2004-07-10 08:41

Message:
Logged In: YES 
user_id=21778

Nutch always normalizes hostnames to lowercase before
filtering them, so checking is already case-insensitive.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to