What is your Nutch version ? If i remember mistake, this is a bug for nutch 2.2.1. It is fixed in 2.x 21 Mar 2014 13:54 tarihinde "reddibabu" <[email protected]> yazdı:
> Nutch/Solr - The pdf is not getting indexed if the pdf size is big enough, > I > am not getting any exceptions but the content in the pdf is not getting > indexed. > > If I am using any small pdf link which does not have any images or urls, > then the content is getting indexed and coming into solr. But when I am > using the pdf links which contains more content the data is not getting > indexed. > I have changed the file.content.limit in nutch-default.xml to -1 and > http.content.size in nutch-site.xml to -1 but it did not helped. > > I have followed the below links to get the thing worked but it did not > helped, any further help would be much appreciated: > > http://grokbase.com/t/nutch/user/129ef77wa7/nutch-solr-pdf-getting-indexed-but-content-is-not-showing-in-solr > http://grokbase.com/t/nutch/user/131apskpxq/crawling-pdfs > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-Solr-Pdf-content-is-not-getting-indexed-tp4125992.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

