Re: Hung threads old problem?

Håvard W. Kongsgård Sun, 29 Jan 2006 14:19:45 -0800

I bet this is the same old problem with a new namehttp://www.mail-archive.com/nutch-user%40lucene.apache.org/msg02673.htmlI think its cause by http.content.limit -1, I did some tests withstandard value and the fetch was fine.

Does anyone know if pdf parse in 0.8-dev works with another value than -1?




Michael Nebel wrote:

Hi,
I can reproduce the problem with the latest version out of svn. :-( Iplayed arround a little bit (most of the day in fact :-) and afterincreasing the parameters
    <property>
      <name>mapred.task.timeout</name>
      <value>6000000</value>
      <description>The number of milliseconds before a task will be
      terminated if it neither reads an input, writes an output, nor
      updates its status string.
      </description>
    </property>


    <property>
      <name>mapred.child.heap.size</name>
      <value>2000m</value>
<description>The heap size (-Xmx) that will be used for tasktracker child processes.</description>
    </property>
the error seems to disappear. But I don't understand why. It's justsome "guessing in the dark".
    Michael



Håvard W. Kongsgård wrote:
Hi, I have a problem with last Friday nightly build. When I try tofetch my segment the fetch process freezes"Aborting with 10 hungthreads".After failing Nutch tries to run the same urls on another tasktrackerbut again fails.
I have tried turning fetcher.parse off, protocol-httpclient,protocol-http.
nutch-site.xml

<property>
 <name>fs.default.name</name>
 <value>linux3:50000</value>
 <description>The name of the default file system.  Either the
 literal string "local" or a host:port for NDFS.</description>
</property>

<property>
 <name>mapred.job.tracker</name>
 <value>linux3:50020</value>
 <description>The host and port that the MapReduce job tracker runs
 at.  If "local", then jobs are run in-process as a single map
 and reduce task.
 </description>
</property>

<property>
 <name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|msword)|index-basic|query-(basic|site|url)</value>
 <description>Regular expression naming plugin directory names to
 include.  Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpointsplugin. By
 default Nutch includes crawling just HTML and plain text via HTTP,
 and basic indexing and search plugins.
 </description>
</property>

<property>
 <name>http.content.limit</name>
 <value>-1</value>
 <description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will betruncated;
 otherwise, no truncation at all.
 </description>
</property>

<property>
 <name>fetcher.parse</name>
 <value>false</value>
 <description>If true, fetcher will parse content.</description>
</property>

Re: Hung threads old problem?

Reply via email to