For some reason nutch starts to crawl inner links at depth 4 for domains with redirects.
-----Original Message----- From: hemantverma09 <[email protected]> To: nutch-user <[email protected]> Sent: Tue, Mar 1, 2011 6:17 am Subject: Can't Crawl Through Home Page, but crawling through inner page I am using nutch 1.1 for crawling. I am able to crawl so many site without any issue but when I am crawling www.magicbricks.com it is stopping at depth=1. I am using "bin/nutch crawl urls/magicbricks/url.txt -dir crawl/magicbricks -threads 10 -depth 3 -topN 10" But if I put links like "http://www.magicbricks.com/bricks/cityIndex.html" or "http://www.magicbricks.com/bricks/propertySearch.html" in urls/magicbricks/url.txt it crawls without any issue. In robots.txt I have allowed my crawler named Propertybot all access to crawl, which can be seen by using http://magicbricks.com/robots.txt Please suggest what can be the reasons, why it is happening. Thanks in advance Hemant Verma -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2601843.html Sent from the Nutch - User mailing list archive at Nabble.com.

