i had a look at the code.
the filtering is definitly there ;)

and at the end i found out that the order of the regex in the regex-urlfilter.txt does matter and mine was wrong.

as i read in the comment the first matching pattern in regex-urlfilter.txt: "The first matching pattern in the file determines whether a URL is included or ignored. If no pattern matches, the URL is ignored."

thanks for your input.

ud



Andrzej Bialecki wrote:

Mr. Udatny wrote:

is it correct that urls which return a redirect to another url are not filtered anymore?
possible to solve?



It's not true. In each case the new URL is passed to URLFilters, and if it comes back empty it is skipped.




-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to