Frank Huang wrote: > > Thanks for your help. > > I crawl over http and set http.content.limit like following in > nutch-default: > <property> > <name>http.content.limit</name> > <value>16777216</value> > <description>The length limit for downloaded content, in bytes. > If this value is nonnegative (>=0), content longer than it will be > truncated; > otherwise, no truncation at all. > </description> > </property> > > but it still show the same error: > fetch okay,but can`t parse http://(omit...).pdf " reason:failed > <omit..>content > truncated at 70709 bytes.Parse can`t handle incomplete pdf file. > > what did I mistake ? thanks > > You must set http.content.limit=-1 . -- View this message in context: http://www.nabble.com/Could-anyone-teache-me-how-to-index--the-title-or-content-of-PDF--tf2203822.html#a6120073 Sent from the Nutch - User forum at Nabble.com.
