Insert in top of the regex-urlfilter.txt:
-http://.*(/.+?)/.*?\1/.*?\1.*?/

EM wrotte:

What to do when encountering sites where nutch falls into recursion mode?

Currently I'm solving this by removing these sites with the regex filter,
but, is anything under development currently?

By recursion I mean nutch fetching <sfdsdf>.com/<sth>/<sth>/<sth>/<sth>/<sth>/<sth> and on and on....

Any tricks to limis the folder depth in the fetch mode?

E.






-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to