What is your Nutch version ? If i remember mistake, this is a bug for nutch
2.2.1. It is fixed in 2.x
21 Mar 2014 13:54 tarihinde "reddibabu" <[email protected]> yazdı:

> Nutch/Solr - The pdf is not getting indexed if the pdf size is big enough,
> I
> am not getting any exceptions but the content in the pdf is not getting
> indexed.
>
> If I am using any small pdf link which does not have any images or urls,
> then the content is getting indexed and coming into solr. But when I am
> using the pdf links which contains more content the data is not getting
> indexed.
> I have changed the file.content.limit in nutch-default.xml to -1 and
> http.content.size in nutch-site.xml to -1 but it did not helped.
>
> I have followed the below links to get the thing worked but it did not
> helped, any further help would be much appreciated:
>
> http://grokbase.com/t/nutch/user/129ef77wa7/nutch-solr-pdf-getting-indexed-but-content-is-not-showing-in-solr
> http://grokbase.com/t/nutch/user/131apskpxq/crawling-pdfs
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-Solr-Pdf-content-is-not-getting-indexed-tp4125992.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Reply via email to