limit to -1. Initially there was
some length parameter specified So it was not actually parsing the whole
page. Only that much length was parsed. That's why we miss some of the links
to next pages.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-not-crawling-j
may be this is the reason it is not crawling product pages.
>
> Can some body please tell me how to solve this and make nutch to crawl such
> pages too.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857630.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
Same thing is happening with me for below site:
www.linenclub.com <http://www.linenclub.com>
www.linenore.com <http://www.linenore.com>
www.zovi.com <http://www.zovi.com>
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp38
Can somebody please help
Why do some sites are not being crawled..
eg.
Nutch failed to crawl
http://www.myntra.com
http://www.jabong.com
http://www.youtube.com
Successfully crawling some other sites.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong
Observe the URL of product page
It is present in directory where index.html of jabong.com is present.
I hope i am clear :)
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857632.html
Sent from the Nutch - User mailing list archive at
l such
pages too.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Nutch-not-crawling-jabong-tp3857630p3857630.html
Sent from the Nutch - User mailing list archive at Nabble.com.
6 matches
Mail list logo