I that case i'm not sure 9GB is enough for 400.000 documents. This is most certainly not enough if you store the content in the segment (default).

On Thu, 10 May 2012 10:43:14 +0200, Igor Salma <[email protected]> wrote:
Thanks Markus,

Yes, we've already changed hadoop.tmp.dir and there is plenty free
space.

All the best,
Igor

On Thu, May 10, 2012 at 10:35 AM, Markus Jelsma  wrote:
 Plenty of disk space does not mean you have enough room in your
hadoop.tmp.dir which is /tmp by default.

 On Thu, 10 May 2012 10:26:00 +0200, Igor Salma  wrote:

 Hi, Adriana, Sebastian,

 We are struggling with this for a days - the problem is cause it
crawls for
 few days and then it breaks with same exception. At first, it seemed
that
 Adriana was right - that we're having problem with disc space but
last two
 breaks occurred with 9GB still left on disc. Also we've moved to
 hadoop-core-1.0.2.jar. One thing more - it seems that it always
fails on
 job_local_0015 (not 100% sure, though):

 2012-05-09 15:55:35,534 WARN  mapred.LocalJobRunner -
job_local_0015
 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any
 valid local directory for output/file.out
        at



org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
        at



org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
        at



org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
        at



org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
        at



org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
        at



org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
        at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at



org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

 Do you know what could it mean?

 @Sebastian: we are running only one instance of Nutch

 We're speaking about ~ 300,000 - 400,000 documents. Should we start
 considering crawl in parallel?

 Thanks in advance.

 All the best,
 Igor

 On Tue, May 1, 2012 at 11:15 PM, Sebastian Nagel  (working in local
mode).
 Are you running multiple instances of Nutch in parallel?
 If yes, these instances must use disjoint temp directories
 (hadoop.tmp.dir). There are multiple posts on this list
 about this topic.

 Sebastian

 On 04/30/2012 03:33 PM, Adriana Farina wrote:

 Hello!

 I had the same kind of problem. In my case this was caused by one of
the
 node of my cluster with full memory, so to solve the priblem I
simply freed
 up memory on that node. Check if all of the nodes of your cluster
have free
 memory.

 As for the second error, it seems you're missing some library: try
adding
 it to hadoop.

 Inviato da iPhone

 Il giorno 30/apr/2012, alle ore 15:15, Igor Salma
  ha scritto:

  Hi to all,

 We're having trouble with nutch when trying to crawl. Nutch version
1.4,
 Hadoop 0.20.2. (working in local mode). After 2 days of crawling
we've
 got:
 org.apache.hadoop.util.**DiskChecker$**DiskErrorException: Could not
 find
 taskTracker/jobcache/job_**local_0015/attempt_local_0015_**

 m_000000_0/output/spill0.out
 in any of the configured local directories
   at
 org.apache.hadoop.fs.**LocalDirAllocator$**AllocatorPerContext.**
 getLocalPathToRead(**LocalDirAllocator.java:389)
   at
 org.apache.hadoop.fs.**LocalDirAllocator.**getLocalPathToRead(**
 LocalDirAllocator.java:138)
   at
 org.apache.hadoop.mapred.**MapOutputFile.getSpillFile(**
 MapOutputFile.java:94)
   at
 org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.**
 mergeParts(MapTask.java:1443)
   at
 org.apache.hadoop.mapred.**MapTask$MapOutputBuffer.flush(**
 MapTask.java:1154)
   at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
 java:359)
   at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
   at
 org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**

 LocalJobRunner.java:177)

 We've looked at mailing list archives but I'm not sure if exact
thing is
 mentioned. Tried to upgrade to hadoop-core-0.20.203.0.jar but then
this
 is
 thrown:
 Exception in thread "main" java.lang.**NoClassDefFoundError:
 org/apache/commons/**configuration/Configuration

 Can someone, please, shed some light on this?

 Thanks.
 Igor

 --
 Markus Jelsma - CTO - Openindex



Links:
------
[1] mailto:[email protected]
[2] mailto:[email protected]
[3] mailto:[email protected]
[4] mailto:[email protected]

--
Markus Jelsma - CTO - Openindex

Reply via email to