We're running a crawl using nutch and the last crawl seemed to be taking
a long time. Looking at the output, it seems it's gone into AOL's
search and is actually crawling search results (it's also crawling some
cgi-bin search results page on another site). This sure seems like it
could go on forever.
Admittedly we haven't looked at this very deeply yet (I'm not sure why
it's got so many search pages on AOL to crawl), but this strikes me that
it's likely a common occurrence if it's acting that way. Is there
something we should be doing to prevent this situation?
Thanks.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general