> I have using nutch 1.7 version. Is it possible to crawl and index data into > Solr below nutch 2.x versions.
Yes, of course! > I have changed the file.content.limit in nutch-default.xml to -1 and > http.content.size in nutch-site.xml to -1 but it did not helped. 1. http.content.limit needs to be set to -1 2. it's recommended to set all customized properties in nutch-site.xml It's best to check the configuration via % $NUTCH_HOME/bin/nutch parsechecker -dumpText http://.../abc.pdf (Only for Nutch 1.8:) If content is truncated this is shown by parsechecker Sebastian On 04/01/2014 08:33 AM, reddibabu wrote: > Hi Talat, > > Thanks for reply. > > I have using nutch 1.7 version. Is it possible to crawl and index data into > Solr below nutch 2.x versions. > If possible then let me the specific configurations for crawling pdf files. > > > Thanks, > Reddi Babu > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-Solr-Pdf-content-is-not-getting-indexed-tp4125992p4128347.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

