Re: Could anyone teache me how to index the title or content of PDF?

King Kong Sun, 03 Sep 2006 01:04:42 -0700


Frank Huang wrote:
> 
> Thanks for your help.
> 
> I crawl over http and set  http.content.limit like following in
> nutch-default:
> <property>
>   <name>http.content.limit</name>
>   <value>16777216</value>
>   <description>The length limit for downloaded content, in bytes.
>   If this value is nonnegative (>=0), content longer than it will be
> truncated;
>   otherwise, no truncation at all.
>   </description>
> </property>
> 
> but it still show the same error:
> fetch okay,but can`t parse http://(omit...).pdf " reason:failed
> <omit..>content
> truncated at 70709 bytes.Parse can`t handle incomplete pdf file.
> 
> what did I mistake ? thanks 
> 
> 


You must set http.content.limit=-1 . 

-- 
View this message in context: 
http://www.nabble.com/Could-anyone-teache-me-how-to-index--the-title-or-content-of-PDF--tf2203822.html#a6120073
Sent from the Nutch - User forum at Nabble.com.

Re: Could anyone teache me how to index the title or content of PDF?

Reply via email to