Use the hadoop.tmp.dir setting in nutch-site.conf to point to a disk where plenty is space is available.
> Other users have previously reported similar problems which were due to a > lack on space on disk as suggested by this > > *Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could > notfind any valid local directory for > taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/sp > i ll0.out* > > Make sure that the temporary directory used by Hadoop is on a partition > with enough space > > HTH > > Julien > > On 4 January 2011 18:19, <alx...@aim.com> wrote: > > Which command did you use? Merging segments is very expensive in > > resources, so I try to avoid merging them. > > > > > > > > > > > > > > > > -----Original Message----- > > From: Marseld Dedgjonaj <marseld.dedgjo...@ikubinfo.com> > > To: user <user@nutch.apache.org> > > Sent: Tue, Jan 4, 2011 7:12 am > > Subject: FW: Exception on segment merging > > > > > > I see in hadup log and some more details about the exception are there. > > > > Please help me what to check for this error. > > > > > > > > Here are the details: > > > > > > > > 2011-01-04 07:40:23,999 INFO segment.SegmentMerger - Slice size: 50000 > > > > URLs. > > > > 2011-01-04 07:40:36,563 INFO segment.SegmentMerger - Slice size: 50000 > > > > URLs. > > > > 2011-01-04 07:40:36,563 INFO segment.SegmentMerger - Slice size: 50000 > > > > URLs. > > > > 2011-01-04 07:40:43,685 INFO segment.SegmentMerger - Slice size: 50000 > > > > URLs. > > > > 2011-01-04 07:40:43,686 INFO segment.SegmentMerger - Slice size: 50000 > > > > URLs. > > > > 2011-01-04 07:40:47,316 WARN mapred.LocalJobRunner - job_local_0001 > > > > java.io.IOException: Spill failed > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.jav > > a:1 > > > > 044) > > > > at java.io.DataOutputStream.write(DataOutputStream.java:90) > > > > at org.apache.hadoop.io.Text.writeString(Text.java:412) > > > > at org.apache.nutch.metadata.Metadata.write(Metadata.java:220) > > > > at org.apache.nutch.protocol.Content.write(Content.java:170) > > > > at > > > > org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135) > > > > at > > org.apache.nutch.metadata.MetaWrapper.write(MetaWrapper.java:107) > > > > at > > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer. > > ser > > > > ialize(WritableSerialization.java:90) > > > > at > > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer. > > ser > > > > ialize(WritableSerialization.java:77) > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:900 > > ) > > > > at > > > > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java: > > 466 > > > > ) > > > > at > > > > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:361) > > > > at > > > > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:113) > > > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > > > at > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could > > not > > > > find any valid local directory for > > > > > > taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/ > > spi > > > > ll0.out > > > > at > > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFo > > rWr > > > > ite(LocalDirAllocator.java:343) > > > > at > > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloc > > ato > > > > r.java:124) > > > > at > > > > org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile > > .ja > > > > va:107) > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.jav > > a:1 > > > > 221) > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java > > :68 > > > > 6) > > > > at > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask. > > jav > > > > a:1173) > > > > > > > > > > > > -----Original Message----- > > > > From: Marseld Dedgjonaj [mailto:marseld.dedgjo...@ikubinfo.com] > > > > Sent: Tuesday, January 04, 2011 1:28 PM > > > > To: user@nutch.apache.org > > > > Subject: Exception on segment merging > > > > > > > > Hello everybody, > > > > > > > > I have configured nutch-1.2 to crawl all urls of a specific website. > > > > > > > > It runs fine for a while but now that the number of indexed urls has > > grown > > > > more than 30'000, I got an exception on segment merging. > > > > > > > > Have anybody seen this kind of error. > > > > > > > > > > > > > > > > The exception is shown below. > > > > > > > > > > > > > > > > Slice size: 50000 URLs. > > > > > > > > > > > > Slice size: 50000 URLs. > > > > > > > > > > > > Slice size: 50000 URLs. > > > > > > > > > > > > Slice size: 50000 URLs. > > > > > > > > > > > > Slice size: 50000 URLs. > > > > > > > > > > > > Exception in thread "main" java.io.IOException: Job failed! > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > > > > > > > > > > > at > > > > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:638) > > > > at > > > > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:683) > > > > > > > > > > > > Merge Segments- End at: 04-01-2011 07:40:48 > > > > > > > > > > > > > > > > Thanks in advance & Best Regards, > > > > > > > > Marseldi > > > > > > > > > > > > > > > > > > > > > > > > <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni > > <b>Punë > > > > të Mirë</b> dhe <b>të Mirë për Punë</b>... > > > > Vizitoni: <a target="_blank" href="http://www.punaime.al/">www.punaime.al > > </a></span></p> > > > > <p><a target="_blank" href="http://www.punaime.al/"><span > > style="text-decoration: > > > > none;"><img width="165" height="31" border="0" alt="punaime" > > > > src="http://www.ikub.al/images/punaime.al_small.png" /></span></a></p>