Hi, I have a problem with last Friday nightly build. When I try to fetch
my segment the fetch process freezes"Aborting with 10 hung threads".
After failing Nutch tries to run the same urls on another tasktracker
but again fails.
I have tried turning fetcher.parse off, protocol-httpclient, protocol-http.
nutch-site.xml
<property>
<name>fs.default.name</name>
<value>linux3:50000</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>linux3:50020</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|pdf|msword)|index-basic|query-(basic|site|url)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins.
</description>
</property>
<property>
<name>http.content.limit</name>
<value>-1</value>
<description>The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be
truncated;
otherwise, no truncation at all.
</description>
</property>
<property>
<name>fetcher.parse</name>
<value>false</value>
<description>If true, fetcher will parse content.</description>
</property>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general