Re: Can't Crawl Through Home Page, but crawling through inner page

Julien Nioche Tue, 01 Mar 2011 06:50:12 -0800

the root page redirects to http://www.m.magicbricks.com/mbs/wapmb
does your URLFIlter configuration allow that host?


On 1 March 2011 09:44, [email protected] <[email protected]> wrote:
>
> I am using nutch 1.1 for crawling.
> I am able to crawl so many site without any issue but when I am crawling
> www.magicbricks.com
> it is stopping at depth=1.
> I am using "bin/nutch crawl urls/magicbricks/url.txt -dir crawl/magicbricks
> -threads 10 -depth 3 -topN 10"
> But if I put links like "http://www.magicbricks.com/bricks/cityIndex.html";
> or "http://www.magicbricks.com/bricks/propertySearch.html"; in
> urls/magicbricks/url.txt it crawls without any issue.
>
> In robots.txt I have allowed my crawler named Propertybot all access to
> crawl, which can be seen by using http://magicbricks.com/robots.txt
>
> Please suggest what can be the reasons, why it is happening.
>
> Thanks in advance
> Hemant Verma
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2601843.html
> Sent from the Nutch - User mailing list archive at Nabble.com.



--

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Can't Crawl Through Home Page, but crawling through inner page

Reply via email to