> I have using nutch 1.7 version. Is it possible to crawl and index data into
> Solr below nutch 2.x versions.

Yes, of course!

> I have changed the file.content.limit in nutch-default.xml to -1 and
> http.content.size in nutch-site.xml to -1 but it did not helped.

1. http.content.limit needs to be set to -1
2. it's recommended to set all customized properties in nutch-site.xml

It's best to check the configuration via

% $NUTCH_HOME/bin/nutch parsechecker -dumpText http://.../abc.pdf

(Only for Nutch 1.8:) If content is truncated this is shown by parsechecker

Sebastian

On 04/01/2014 08:33 AM, reddibabu wrote:
> Hi Talat,
> 
> Thanks for reply.
> 
> I have using nutch 1.7 version. Is it possible to crawl and index data into
> Solr below nutch 2.x versions.
> If possible then let me the specific configurations for crawling pdf files.
> 
> 
> Thanks,
> Reddi Babu
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-Solr-Pdf-content-is-not-getting-indexed-tp4125992p4128347.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to