Hi,
Based on my problem that I have to crawl a site that redirects to
itself, I am now thinking about creating a Nutch plugin that allows to
crawl certain URLs twice.
Since I'm not too familiar with the Nutch code, I would appreciate any
pointers on where to start - or is there already an option available in
Nutch which I missed?
A more thourough explination about why the page is redirecting to itself
can be found in an earlier thread [1].
Thanks,
Elisabeth
[1]
http://markmail.org/search/?q=list%3Aorg.apache.lucene.nutch-user+crawling+and+redirects#query:list%3Aorg.apache.lucene.nutch-user%20crawling%20and%20redirects+page:1+mid:urds3zg2kp7n6o46+state:results