Upgraded to Hadoop 0.20.205.0 and the DiskErrorException dissappears,
but the same result occurs, i.e. only the crawl_fetch and crawl_data
directories get merged, no parse_data directory exists.
Arghhhhhhhhh.
Dean.
On 10/01/2012 11:33, Dean Pullen wrote:
I'm running in local mode (I believe) and using hadoop 0.20.2, as this
is the lib version shipped with nutch 1.4
Dean.
On 09/01/2012 16:41, Lewis John Mcgibbney wrote:
How are you running Nutch local or deploy mode? Which hadoop versions
are you using 0.20.2? This appears to be an open issue with this
version [1].
Also please have a look here [2] for a similar frustrating situation.
[1]https://issues.apache.org/jira/browse/HADOOP-6958
[2]http://lucene.472066.n3.nabble.com/org-apache-hadoop-util-DiskChecker-DiskErrorException-td1792797.html
On Mon, Jan 9, 2012 at 4:14 PM, Dean
Pullen<[email protected]> wrote:
This is interesting, and something I've only just noticed in the logs:
2012-01-09 16:02:27,257 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201201091558_0008/attempt_201201091558_0008_m_000006_0/output/file.out
in any of the configured local directories
This is during the mergesegs job (and previous jobs).....but I'm not
sure
what it means or if it's actually a problem.
mapred.local.dir is set to /opt/nutch_1_4/data/local - which exists.
It suggests that the map part of the hadoop job has not produced an
output
file, or it's looking in the wrong place?
Dean