Hi,

I have checked end rececked but I see that nutch-site.xml if the regex-urlfilter.txt part. Started Fetch
and see that regex-urlfilter.txt is not loaded. While when starti fetch I see:
040519 193606 loading file:/d1/nutch/conf/nutch-default.xml
040519 193606 loading file:/d1/nutch/conf/nutch-site.xml
......
Informational Robots.txt entries we'll obey (in order):
...
040519 193606 found resource banned-hosts.txt at file:/d1/nutch/conf/banned-hosts.txt
040519 193606 found resource mime.types at file:/d1/nutch/conf/mime.types


But no mention about  regex-urlfilter.txt

Is this file loaded or not? I use the lates CVS. I follow the Fetcher log and see that the banned urls (end with ?, for example) is crawled. So I suspect that Urlfilter don't work.

Thanks



-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to