job_local..

Markus Jelsma Thu, 10 May 2012 01:33:42 -0700

Plenty of disk space does not mean you have enough room in yourhadoop.tmp.dir which is /tmp by default.

On Thu, 10 May 2012 10:26:00 +0200, Igor Salma <[email protected]>wrote:

Hi, Adriana, Sebastian,
We are struggling with this for a days - the problem is cause itcrawls forfew days and then it breaks with same exception. At first, it seemedthatAdriana was right - that we're having problem with disc space butlast two
breaks occurred with 9GB still left on disc. Also we've moved to
hadoop-core-1.0.2.jar. One thing more - it seems that it always failson
job_local_0015 (not 100% sure, though):

2012-05-09 15:55:35,534 WARN  mapred.LocalJobRunner - job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not findany
valid local directory for output/file.out
        at

org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
        at

org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
        at

org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
        at

org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
        at

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
        at

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
atorg.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Do you know what could it mean?

@Sebastian: we are running only one instance of Nutch

We're speaking about ~ 300,000 - 400,000 documents. Should we start
considering crawl in parallel?

Thanks in advance.

All the best,
Igor
On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel<[email protected]
wrote:
Hi Igor,

no disk space on /tmp is one possible reason.

The other is:
> (working in local mode).
Are you running multiple instances of Nutch in parallel?
If yes, these instances must use disjoint temp directories
(hadoop.tmp.dir). There are multiple posts on this list
about this topic.

Sebastian


On 04/30/2012 03:33 PM, Adriana Farina wrote:
Hello!
I had the same kind of problem. In my case this was caused by oneof thenode of my cluster with full memory, so to solve the priblem Isimply freedup memory on that node. Check if all of the nodes of your clusterhave free
memory.
As for the second error, it seems you're missing some library: tryadding
it to hadoop.


Inviato da iPhone
Il giorno 30/apr/2012, alle ore 15:15, IgorSalma<[email protected]>
 ha scritto:

 Hi to all,
We're having trouble with nutch when trying to crawl. Nutchversion 1.4,Hadoop 0.20.2. (working in local mode). After 2 days of crawlingwe've
got:
org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Couldnot
find
taskTracker/jobcache/job_**local_0015/attempt_local_0015_**
m_000000_0/output/spill0.out
in any of the configured local directories
   at
org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
getLocalPathToRead(**LocalDirAllocator.java:389)
   at
org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
LocalDirAllocator.java:138)
   at
org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
MapOutputFile.java:94)
   at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
mergeParts(MapTask.java:1443)
   at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
MapTask.java:1154)
   at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
java:359)
   at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
   at
org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
LocalJobRunner.java:177)
We've looked at mailing list archives but I'm not sure if exactthing ismentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but thenthis
is
thrown:
Exception in thread "main" java.lang.**NoClassDefFoundError:
org/apache/commons/**configuration/Configuration

Can someone, please, shed some light on this?

Thanks.
Igor


--
Markus Jelsma - CTO - Openindex

Re: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local..

Reply via email to