Re: Unable to crawl and index pdf metadata into Solr from Nutch

remi tassing Fri, 21 Mar 2014 02:04:33 -0700

Hi,

modify the default value of http.content.limit and/or ftp.content.limit
value accordingly.
This problem has nothing to do with the format but the content size


Remi


On Fri, Mar 21, 2014 at 4:52 PM, reddibabu <[email protected]> wrote:

> Hi,
>
> I am using Nutch 1.7 and Solr 4.5
>
> I can able to crawl any PDF from Nutch side and it can display some
> metadata
> on terminal by using  "bin/nutch indexchecker
> http://www.master.netseven.it/files/262-Nutch.pdf";. But I am not able to
> index same pdf details into Solr.
>
> I got some "INFO:parse.ParseSegment -
> http://master.netseven.it/files/262-Nutch.pdf skipped. Content of size
> 371452 was truncated to 62630" on terminal
>
> Is there any size limit for PDF and let me know how to set unlimit (-1) to
> PDF content ?
>
> Please any one assist me on the same
>
>
> Thanks in advance.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Unable-to-crawl-and-index-pdf-metadata-into-Solr-from-Nutch-tp4125941.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Re: Unable to crawl and index pdf metadata into Solr from Nutch

Reply via email to