Hi,

I was wondering if it's possible to get crawl to go through a website and
only report links that return a specific http response code (eg 404) ? I'm
looking to somehow automate basic site testing of rather huge websites,
inevitably one ends up in the world of crawlers (and being a java guy myself
this means nutch).

I'm still going through the faq and first basic steps, so apologies if what
i'm asking is the most basic nutch-thing ever :)

Thanks
Jorg

Reply via email to