i had a look at the code.
the filtering is definitly there ;)
and at the end i found out that the order of the regex in the
regex-urlfilter.txt does matter and mine was wrong.
as i read in the comment the first matching pattern in
regex-urlfilter.txt: "The first matching pattern in the file determines
whether a URL is included or ignored. If no pattern matches, the URL is
ignored."
thanks for your input.
ud
Andrzej Bialecki wrote:
Mr. Udatny wrote:
is it correct that urls which return a redirect to another url are
not filtered anymore?
possible to solve?
It's not true. In each case the new URL is passed to URLFilters, and
if it comes back empty it is skipped.
-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general