This is interesting, and something I've only just noticed in the logs:
2012-01-09 16:02:27,257 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201201091558_0008/attempt_201201091558_0008_m_000006_0/output/file.out
in any of the configured local directories
This is during the mergesegs job (and previous jobs).....but I'm not
sure what it means or if it's actually a problem.
mapred.local.dir is set to /opt/nutch_1_4/data/local - which exists.
It suggests that the map part of the hadoop job has not produced an
output file, or it's looking in the wrong place?
Dean