Unable to crawl and index pdf metadata into Solr from Nutch

reddibabu Fri, 21 Mar 2014 01:53:22 -0700

Hi,

I am using Nutch 1.7 and Solr 4.5


I can able to crawl any PDF from Nutch side and it can display some metadata
on terminal by using  "bin/nutch indexchecker
http://www.master.netseven.it/files/262-Nutch.pdf";. But I am not able to
index same pdf details into Solr.

I got some "INFO:parse.ParseSegment -
http://master.netseven.it/files/262-Nutch.pdf skipped. Content of size
371452 was truncated to 62630" on terminal

Is there any size limit for PDF and let me know how to set unlimit (-1) to
PDF content ?

Please any one assist me on the same


Thanks in advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-crawl-and-index-pdf-metadata-into-Solr-from-Nutch-tp4125941.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Unable to crawl and index pdf metadata into Solr from Nutch

Reply via email to