Thanks Markus, Yes, we've already changed hadoop.tmp.dir and there is plenty free space.
All the best, Igor On Thu, May 10, 2012 at 10:35 AM, Markus Jelsma <[email protected]>wrote: > Plenty of disk space does not mean you have enough room in your > hadoop.tmp.dir which is /tmp by default. > > > On Thu, 10 May 2012 10:26:00 +0200, Igor Salma <[email protected]> > wrote: > >> Hi, Adriana, Sebastian, >> >> We are struggling with this for a days - the problem is cause it crawls >> for >> few days and then it breaks with same exception. At first, it seemed that >> Adriana was right - that we're having problem with disc space but last two >> breaks occurred with 9GB still left on disc. Also we've moved to >> hadoop-core-1.0.2.jar. One thing more - it seems that it always fails on >> job_local_0015 (not 100% sure, though): >> >> 2012-05-09 15:55:35,534 WARN mapred.LocalJobRunner - job_local_0015 >> org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not >> find any >> valid local directory for output/file.out >> at >> >> org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.** >> getLocalPathForWrite(**LocalDirAllocator.java:381) >> at >> >> org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathForWrite(** >> LocalDirAllocator.java:146) >> at >> >> org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathForWrite(** >> LocalDirAllocator.java:127) >> at >> >> org.apache.hadoop.mapred.**MapOutputFile.**getOutputFileForWrite(** >> MapOutputFile.java:69) >> at >> >> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.** >> mergeParts(MapTask.java:1640) >> at >> >> org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(** >> MapTask.java:1323) >> at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.** >> java:437) >> at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:372) >> at >> >> org.apache.hadoop.mapred.**LocalJobRunner$Job.run(** >> LocalJobRunner.java:212) >> >> Do you know what could it mean? >> >> @Sebastian: we are running only one instance of Nutch >> >> We're speaking about ~ 300,000 - 400,000 documents. Should we start >> considering crawl in parallel? >> >> Thanks in advance. >> >> All the best, >> Igor >> >> >> >> On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel < >> [email protected] >> >>> wrote: >>> >> >> Hi Igor, >>> >>> no disk space on /tmp is one possible reason. >>> >>> The other is: >>> > (working in local mode). >>> Are you running multiple instances of Nutch in parallel? >>> If yes, these instances must use disjoint temp directories >>> (hadoop.tmp.dir). There are multiple posts on this list >>> about this topic. >>> >>> Sebastian >>> >>> >>> On 04/30/2012 03:33 PM, Adriana Farina wrote: >>> >>> Hello! >>>> >>>> I had the same kind of problem. In my case this was caused by one of the >>>> node of my cluster with full memory, so to solve the priblem I simply >>>> freed >>>> up memory on that node. Check if all of the nodes of your cluster have >>>> free >>>> memory. >>>> >>>> As for the second error, it seems you're missing some library: try >>>> adding >>>> it to hadoop. >>>> >>>> >>>> Inviato da iPhone >>>> >>>> Il giorno 30/apr/2012, alle ore 15:15, Igor Salma<[email protected]> >>>> ha scritto: >>>> >>>> Hi to all, >>>> >>>>> >>>>> We're having trouble with nutch when trying to crawl. Nutch version >>>>> 1.4, >>>>> Hadoop 0.20.2. (working in local mode). After 2 days of crawling we've >>>>> got: >>>>> org.apache.hadoop.util.****DiskChecker$****DiskErrorException: Could >>>>> not >>>>> find >>>>> taskTracker/jobcache/job_****local_0015/attempt_local_0015_**** >>>>> >>>>> m_000000_0/output/spill0.out >>>>> in any of the configured local directories >>>>> at >>>>> org.apache.hadoop.fs.****LocalDirAllocator$****AllocatorPerContext.** >>>>> getLocalPathToRead(****LocalDirAllocator.java:389) >>>>> at >>>>> org.apache.hadoop.fs.****LocalDirAllocator.****getLocalPathToRead(** >>>>> LocalDirAllocator.java:138) >>>>> at >>>>> org.apache.hadoop.mapred.****MapOutputFile.getSpillFile(** >>>>> MapOutputFile.java:94) >>>>> at >>>>> org.apache.hadoop.mapred.****MapTask$MapOutputBuffer.** >>>>> mergeParts(MapTask.java:1443) >>>>> at >>>>> org.apache.hadoop.mapred.****MapTask$MapOutputBuffer.flush(**** >>>>> MapTask.java:1154) >>>>> at org.apache.hadoop.mapred.****MapTask.runOldMapper(MapTask.**** >>>>> java:359) >>>>> at org.apache.hadoop.mapred.****MapTask.run(MapTask.java:307) >>>>> at >>>>> org.apache.hadoop.mapred.****LocalJobRunner$Job.run(** >>>>> >>>>> LocalJobRunner.java:177) >>>>> >>>>> We've looked at mailing list archives but I'm not sure if exact thing >>>>> is >>>>> mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then this >>>>> is >>>>> thrown: >>>>> Exception in thread "main" java.lang.****NoClassDefFoundError: >>>>> org/apache/commons/****configuration/Configuration >>>>> >>>>> >>>>> Can someone, please, shed some light on this? >>>>> >>>>> Thanks. >>>>> Igor >>>>> >>>>> >>>> >>> > -- > Markus Jelsma - CTO - Openindex >

