There are some regexes in the url normalizers and there is some code in DomContentUtils for recursion.

Dennis

brainstorm wrote:
Hi!

I guess it is implemented, but cannot find it by myself on nutch API
docs nor wiki :-/ ... Is there any mechanism implemented in nutch to
detect spider traps[1] ?

Thanks,
Roman

[1] http://en.wikipedia.org/wiki/Spider_trap

Reply via email to