Hi, modify the default value of http.content.limit and/or ftp.content.limit value accordingly. This problem has nothing to do with the format but the content size
Remi On Fri, Mar 21, 2014 at 4:52 PM, reddibabu <[email protected]> wrote: > Hi, > > I am using Nutch 1.7 and Solr 4.5 > > I can able to crawl any PDF from Nutch side and it can display some > metadata > on terminal by using "bin/nutch indexchecker > http://www.master.netseven.it/files/262-Nutch.pdf". But I am not able to > index same pdf details into Solr. > > I got some "INFO:parse.ParseSegment - > http://master.netseven.it/files/262-Nutch.pdf skipped. Content of size > 371452 was truncated to 62630" on terminal > > Is there any size limit for PDF and let me know how to set unlimit (-1) to > PDF content ? > > Please any one assist me on the same > > > Thanks in advance. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Unable-to-crawl-and-index-pdf-metadata-into-Solr-from-Nutch-tp4125941.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

