Re: parse data directory not found after merge

Dean Pullen Tue, 10 Jan 2012 08:49:43 -0800

Pretty sure the same thing is happening with Hadoop 1.0...


On 10/01/2012 14:11, Dean Pullen wrote:

Upgraded to Hadoop 0.20.205.0 and the DiskErrorException dissappears,but the same result occurs, i.e. only the crawl_fetch and crawl_datadirectories get merged, no parse_data directory exists.
Arghhhhhhhhh.


Dean.

On 10/01/2012 11:33, Dean Pullen wrote:
I'm running in local mode (I believe) and using hadoop 0.20.2, asthis is the lib version shipped with nutch 1.4
Dean.

On 09/01/2012 16:41, Lewis John Mcgibbney wrote:
How are you running Nutch local or deploy mode? Which hadoop versions
are you using 0.20.2? This appears to be an open issue with this
version [1].

Also please have a look here [2] for a similar frustrating situation.

[1]https://issues.apache.org/jira/browse/HADOOP-6958
[2]http://lucene.472066.n3.nabble.com/org-apache-hadoop-util-DiskChecker-DiskErrorException-td1792797.html
On Mon, Jan 9, 2012 at 4:14 PM, DeanPullen<[email protected]> wrote:
This is interesting, and something I've only just noticed in the logs:

2012-01-09 16:02:27,257 INFO org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_201201091558_0008/attempt_201201091558_0008_m_000006_0/output/file.out
in any of the configured local directories
This is during the mergesegs job (and previous jobs).....but I'mnot sure
what it means or if it's actually a problem.

mapred.local.dir is set to /opt/nutch_1_4/data/local - which exists.
It suggests that the map part of the hadoop job has not produced anoutput
file, or it's looking in the wrong place?

Dean

Re: parse data directory not found after merge

Reply via email to