Bugs item #988325, was opened at 2004-07-10 03:41 Message generated for change (Comment added) made by magnum74 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548
Category: None Group: None Status: Closed Resolution: Invalid Priority: 2 Submitted By: Fabio Gasparetti (magnum74) Assigned to: Nobody/Anonymous (nobody) Summary: case insensitive hostname Initial Comment: The RegexURLFilter does not consider case insensitive hostnames. If you have two links in your site: mysite.net/ and MySite.net/, you need to specify something like [Mm][Yy][Ss]... in the urlfilter.txt file to catch both of them. Perhaps just a simple remainder in the accept host comment would be useful. ---------------------------------------------------------------------- >Comment By: Fabio Gasparetti (magnum74) Date: 2004-07-10 19:38 Message: Logged In: YES user_id=666942 Yeah, but as far as I see in the source I guess that the normalization happens in the Link constructor, when the url has been already filtered by the call: URLFilterFactory.getFilter().filter(url); in the UpdateDatabaseTool.pageContentsChanged() method. ---------------------------------------------------------------------- Comment By: Doug Cutting (cutting) Date: 2004-07-10 17:41 Message: Logged In: YES user_id=21778 Nutch always normalizes hostnames to lowercase before filtering them, so checking is already case-insensitive. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988325&group_id=59548 ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
