Why is that few http sites doesn't get crawled.

David Philip Sat, 02 Aug 2014 04:28:33 -0700

Hi,

   This should be naive question. Apologies for that.


I was trying to crawl the quora Q and A site. The seed file with these two
urls http://www.quora.com/Data-Visualization/
http://www.quora.com/

But the crawl didn't pick any of these pages. Why?

While I give "http://nutch.apache.org/";, this site gets crawled.

Note that I have not put any restriction in regex filter. It is +.

Thanks - David

Why is that few http sites doesn't get crawled.

Reply via email to