Plenty of disk space does not mean you have enough room in your hadoop.tmp.dir which is /tmp by default.

On Thu, 10 May 2012 10:26:00 +0200, Igor Salma <[email protected]> wrote:
Hi, Adriana, Sebastian,

We are struggling with this for a days - the problem is cause it crawls for few days and then it breaks with same exception. At first, it seemed that Adriana was right - that we're having problem with disc space but last two
breaks occurred with 9GB still left on disc. Also we've moved to
hadoop-core-1.0.2.jar. One thing more - it seems that it always fails on
job_local_0015 (not 100% sure, though):

2012-05-09 15:55:35,534 WARN  mapred.LocalJobRunner - job_local_0015
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/file.out
        at

org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
        at

org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
        at

org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
        at

org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
        at

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
        at

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Do you know what could it mean?

@Sebastian: we are running only one instance of Nutch

We're speaking about ~ 300,000 - 400,000 documents. Should we start
considering crawl in parallel?

Thanks in advance.

All the best,
Igor



On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel <[email protected]
wrote:

Hi Igor,

no disk space on /tmp is one possible reason.

The other is:
> (working in local mode).
Are you running multiple instances of Nutch in parallel?
If yes, these instances must use disjoint temp directories
(hadoop.tmp.dir). There are multiple posts on this list
about this topic.

Sebastian


On 04/30/2012 03:33 PM, Adriana Farina wrote:

Hello!

I had the same kind of problem. In my case this was caused by one of the node of my cluster with full memory, so to solve the priblem I simply freed up memory on that node. Check if all of the nodes of your cluster have free
memory.

As for the second error, it seems you're missing some library: try adding
it to hadoop.


Inviato da iPhone

Il giorno 30/apr/2012, alle ore 15:15, Igor Salma<[email protected]>
 ha scritto:

 Hi to all,

We're having trouble with nutch when trying to crawl. Nutch version 1.4, Hadoop 0.20.2. (working in local mode). After 2 days of crawling we've
got:
org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not
find
taskTracker/jobcache/job_**local_0015/attempt_local_0015_**
m_000000_0/output/spill0.out
in any of the configured local directories
   at
org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
getLocalPathToRead(**LocalDirAllocator.java:389)
   at
org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
LocalDirAllocator.java:138)
   at
org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
MapOutputFile.java:94)
   at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
mergeParts(MapTask.java:1443)
   at
org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
MapTask.java:1154)
   at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
java:359)
   at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
   at
org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
LocalJobRunner.java:177)

We've looked at mailing list archives but I'm not sure if exact thing is mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then this
is
thrown:
Exception in thread "main" java.lang.**NoClassDefFoundError:
org/apache/commons/**configuration/Configuration

Can someone, please, shed some light on this?

Thanks.
Igor




--
Markus Jelsma - CTO - Openindex

Reply via email to