the root page redirects to http://www.m.magicbricks.com/mbs/wapmb does your URLFIlter configuration allow that host?
On 1 March 2011 09:44, [email protected] <[email protected]> wrote: > > I am using nutch 1.1 for crawling. > I am able to crawl so many site without any issue but when I am crawling > www.magicbricks.com > it is stopping at depth=1. > I am using "bin/nutch crawl urls/magicbricks/url.txt -dir crawl/magicbricks > -threads 10 -depth 3 -topN 10" > But if I put links like "http://www.magicbricks.com/bricks/cityIndex.html" > or "http://www.magicbricks.com/bricks/propertySearch.html" in > urls/magicbricks/url.txt it crawls without any issue. > > In robots.txt I have allowed my crawler named Propertybot all access to > crawl, which can be seen by using http://magicbricks.com/robots.txt > > Please suggest what can be the reasons, why it is happening. > > Thanks in advance > Hemant Verma > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2601843.html > Sent from the Nutch - User mailing list archive at Nabble.com. -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

