Re: Nutch not crawling jabong

2012-09-24 Thread blunderboy
limit to -1. Initially there was some length parameter specified So it was not actually parsing the whole page. Only that much length was parsed. That's why we miss some of the links to next pages. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-not-crawling-j

Re: Nutch not crawling jabong

2012-09-24 Thread Sebastian Nagel
may be this is the reason it is not crawling product pages. > > Can some body please tell me how to solve this and make nutch to crawl such > pages too. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857630.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

Re: Nutch not crawling jabong

2012-09-22 Thread Mansur
Same thing is happening with me for below site: www.linenclub.com <http://www.linenclub.com> www.linenore.com <http://www.linenore.com> www.zovi.com <http://www.zovi.com> -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp38

Re: Nutch not crawling jabong

2012-03-26 Thread blunderboy
Can somebody please help Why do some sites are not being crawled.. eg. Nutch failed to crawl http://www.myntra.com http://www.jabong.com http://www.youtube.com Successfully crawling some other sites. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong

Re: Nutch not crawling jabong

2012-03-26 Thread blunderboy
Observe the URL of product page It is present in directory where index.html of jabong.com is present. I hope i am clear :) -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857632.html Sent from the Nutch - User mailing list archive at

Nutch not crawling jabong

2012-03-26 Thread blunderboy
l such pages too. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857630.html Sent from the Nutch - User mailing list archive at Nabble.com.