Hi, Adriana, Sebastian,
We are struggling with this for a days - the problem is cause it crawls for
few days and then it breaks with same exception. At first, it seemed that
Adriana was right - that we're having problem with disc space but last two
breaks occurred with 9GB still left on disc. Also we've moved to
hadoop-core-1.0.2.jar. One thing more - it seems that it always fails on
job_local_0015 (not 100% sure, though):
2012-05-09 15:55:35,534 WARN mapred.LocalJobRunner - job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/file.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at
org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Do you know what could it mean?
@Sebastian: we are running only one instance of Nutch
We're speaking about ~ 300,000 - 400,000 documents. Should we start
considering crawl in parallel?
Thanks in advance.
All the best,
Igor
On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel <[email protected]
> wrote:
> Hi Igor,
>
> no disk space on /tmp is one possible reason.
>
> The other is:
> > (working in local mode).
> Are you running multiple instances of Nutch in parallel?
> If yes, these instances must use disjoint temp directories
> (hadoop.tmp.dir). There are multiple posts on this list
> about this topic.
>
> Sebastian
>
>
> On 04/30/2012 03:33 PM, Adriana Farina wrote:
>
>> Hello!
>>
>> I had the same kind of problem. In my case this was caused by one of the
>> node of my cluster with full memory, so to solve the priblem I simply freed
>> up memory on that node. Check if all of the nodes of your cluster have free
>> memory.
>>
>> As for the second error, it seems you're missing some library: try adding
>> it to hadoop.
>>
>>
>> Inviato da iPhone
>>
>> Il giorno 30/apr/2012, alle ore 15:15, Igor Salma<[email protected]>
>> ha scritto:
>>
>> Hi to all,
>>>
>>> We're having trouble with nutch when trying to crawl. Nutch version 1.4,
>>> Hadoop 0.20.2. (working in local mode). After 2 days of crawling we've
>>> got:
>>> org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not
>>> find
>>> taskTracker/jobcache/job_**local_0015/attempt_local_0015_**
>>> m_000000_0/output/spill0.out
>>> in any of the configured local directories
>>> at
>>> org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
>>> getLocalPathToRead(**LocalDirAllocator.java:389)
>>> at
>>> org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
>>> LocalDirAllocator.java:138)
>>> at
>>> org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
>>> MapOutputFile.java:94)
>>> at
>>> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
>>> mergeParts(MapTask.java:1443)
>>> at
>>> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
>>> MapTask.java:1154)
>>> at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
>>> java:359)
>>> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
>>> at
>>> org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
>>> LocalJobRunner.java:177)
>>>
>>> We've looked at mailing list archives but I'm not sure if exact thing is
>>> mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then this
>>> is
>>> thrown:
>>> Exception in thread "main" java.lang.**NoClassDefFoundError:
>>> org/apache/commons/**configuration/Configuration
>>>
>>> Can someone, please, shed some light on this?
>>>
>>> Thanks.
>>> Igor
>>>
>>
>