I am using nutch 1.1 for crawling.
I am able to crawl so many site without any issue but when I am crawling
www.magicbricks.com
it is stopping at depth=1.
I am using "bin/nutch crawl urls/magicbricks/url.txt -dir crawl/magicbricks
-threads 10 -depth 3 -topN 10"
But if I put links like "http://www.magicbricks.com/bricks/cityIndex.html";
or "http://www.magicbricks.com/bricks/propertySearch.html"; in
urls/magicbricks/url.txt it crawls without any issue.

In robots.txt I have allowed my crawler named Propertybot all access to
crawl, which can be seen by using http://magicbricks.com/robots.txt

Please suggest what can be the reasons, why it is happening.

Thanks in advance
Hemant Verma

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2601843.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to