Re: How do I find out what is goint wrong?

Andrzej Bialecki Sun, 06 Aug 2006 10:36:11 -0700

Iain wrote:

I'm testing nutch with a view to exhaustive scraping (using version 0.8).


But I've got some sites that don't scrape and no idea why.  Case in point is
http://www.idc.com.

This is a HUGE site, but I get nothing in nutch.

Check http://www.idc.com/robots.txt - it specifically disallows allother robots (*) from accessing this site.

(and I agree that we should produce some message in the logs about this...).


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: How do I find out what is goint wrong?

Reply via email to