May you can contribute this as plugin for nutch?
thanks,
Stefan

Am 26.02.2005 um 16:27 schrieb Phoebe Miller:


I had the same problem, and my list of hosts was the the thousands, so regex
was a little inefficient for that.
I subclassed the regex_urlfilter and created a list of hosts based on domain
names and implemented that with a hashmap. It runs pretty well as lookups
are cheap.



P.



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general



---------------------------------------------------------------
company:                http://www.media-style.com
forum:          http://www.text-mining.org
blog:                   http://www.find23.net



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to