Pretty sure the same thing is happening with Hadoop 1.0...

On 10/01/2012 14:11, Dean Pullen wrote:
Upgraded to Hadoop 0.20.205.0 and the DiskErrorException dissappears, but the same result occurs, i.e. only the crawl_fetch and crawl_data directories get merged, no parse_data directory exists.

Arghhhhhhhhh.


Dean.

On 10/01/2012 11:33, Dean Pullen wrote:
I'm running in local mode (I believe) and using hadoop 0.20.2, as this is the lib version shipped with nutch 1.4

Dean.

On 09/01/2012 16:41, Lewis John Mcgibbney wrote:
How are you running Nutch local or deploy mode? Which hadoop versions
are you using 0.20.2? This appears to be an open issue with this
version [1].

Also please have a look here [2] for a similar frustrating situation.

[1]https://issues.apache.org/jira/browse/HADOOP-6958
[2]http://lucene.472066.n3.nabble.com/org-apache-hadoop-util-DiskChecker-DiskErrorException-td1792797.html

On Mon, Jan 9, 2012 at 4:14 PM, Dean Pullen<[email protected]> wrote:
This is interesting, and something I've only just noticed in the logs:

2012-01-09 16:02:27,257 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201201091558_0008/attempt_201201091558_0008_m_000006_0/output/file.out
in any of the configured local directories

This is during the mergesegs job (and previous jobs).....but I'm not sure
what it means or if it's actually a problem.

mapred.local.dir is set to /opt/nutch_1_4/data/local - which exists.

It suggests that the map part of the hadoop job has not produced an output
file, or it's looking in the wrong place?

Dean





Reply via email to