Nutch/Solr - The pdf is not getting indexed if the pdf size is big enough, I
am not getting any exceptions but the content in the pdf is not getting
indexed.

If I am using any small pdf link which does not have any images or urls,
then the content is getting indexed and coming into solr. But when I am
using the pdf links which contains more content the data is not getting
indexed.
I have changed the file.content.limit in nutch-default.xml to -1 and
http.content.size in nutch-site.xml to -1 but it did not helped.

I have followed the below links to get the thing worked but it did not
helped, any further help would be much appreciated:
http://grokbase.com/t/nutch/user/129ef77wa7/nutch-solr-pdf-getting-indexed-but-content-is-not-showing-in-solr
http://grokbase.com/t/nutch/user/131apskpxq/crawling-pdfs



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-Solr-Pdf-content-is-not-getting-indexed-tp4125992.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to