Hi,

Based on my problem that I have to crawl a site that redirects to itself, I am now thinking about creating a Nutch plugin that allows to crawl certain URLs twice. Since I'm not too familiar with the Nutch code, I would appreciate any pointers on where to start - or is there already an option available in Nutch which I missed?

A more thourough explination about why the page is redirecting to itself can be found in an earlier thread [1].

Thanks,
Elisabeth

[1] http://markmail.org/search/?q=list%3Aorg.apache.lucene.nutch-user+crawling+and+redirects#query:list%3Aorg.apache.lucene.nutch-user%20crawling%20and%20redirects+page:1+mid:urds3zg2kp7n6o46+state:results

Reply via email to