Other users have previously reported similar problems which were due to a lack on space on disk as suggested by this
*Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could notfind any valid local directory for taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/spi ll0.out* Make sure that the temporary directory used by Hadoop is on a partition with enough space HTH Julien On 4 January 2011 18:19, <[email protected]> wrote: > Which command did you use? Merging segments is very expensive in resources, > so I try to avoid merging them. > > > > > > > > -----Original Message----- > From: Marseld Dedgjonaj <[email protected]> > To: user <[email protected]> > Sent: Tue, Jan 4, 2011 7:12 am > Subject: FW: Exception on segment merging > > > I see in hadup log and some more details about the exception are there. > > Please help me what to check for this error. > > > > Here are the details: > > > > 2011-01-04 07:40:23,999 INFO segment.SegmentMerger - Slice size: 50000 > > URLs. > > 2011-01-04 07:40:36,563 INFO segment.SegmentMerger - Slice size: 50000 > > URLs. > > 2011-01-04 07:40:36,563 INFO segment.SegmentMerger - Slice size: 50000 > > URLs. > > 2011-01-04 07:40:43,685 INFO segment.SegmentMerger - Slice size: 50000 > > URLs. > > 2011-01-04 07:40:43,686 INFO segment.SegmentMerger - Slice size: 50000 > > URLs. > > 2011-01-04 07:40:47,316 WARN mapred.LocalJobRunner - job_local_0001 > > java.io.IOException: Spill failed > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1 > > 044) > > at java.io.DataOutputStream.write(DataOutputStream.java:90) > > at org.apache.hadoop.io.Text.writeString(Text.java:412) > > at org.apache.nutch.metadata.Metadata.write(Metadata.java:220) > > at org.apache.nutch.protocol.Content.write(Content.java:170) > > at > > org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135) > > at org.apache.nutch.metadata.MetaWrapper.write(MetaWrapper.java:107) > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.ser > > ialize(WritableSerialization.java:90) > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.ser > > ialize(WritableSerialization.java:77) > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:900) > > at > > > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466 > > ) > > at > > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:361) > > at > > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:113) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not > > find any valid local directory for > > > taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/spi > > ll0.out > > at > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWr > > ite(LocalDirAllocator.java:343) > > at > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocato > > r.java:124) > > at > > > org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.ja > > va:107) > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1 > > 221) > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:68 > > 6) > > at > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav > > a:1173) > > > > > > -----Original Message----- > > From: Marseld Dedgjonaj [mailto:[email protected]] > > Sent: Tuesday, January 04, 2011 1:28 PM > > To: [email protected] > > Subject: Exception on segment merging > > > > Hello everybody, > > > > I have configured nutch-1.2 to crawl all urls of a specific website. > > > > It runs fine for a while but now that the number of indexed urls has grown > > more than 30'000, I got an exception on segment merging. > > > > Have anybody seen this kind of error. > > > > > > > > The exception is shown below. > > > > > > > > Slice size: 50000 URLs. > > > > > > Slice size: 50000 URLs. > > > > > > Slice size: 50000 URLs. > > > > > > Slice size: 50000 URLs. > > > > > > Slice size: 50000 URLs. > > > > > > Exception in thread "main" java.io.IOException: Job failed! > > > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > > > > > at > > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:638) > > > > > > at > > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:683) > > > > > > Merge Segments- End at: 04-01-2011 07:40:48 > > > > > > > > Thanks in advance & Best Regards, > > > > Marseldi > > > > > > > > > > > > <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni > <b>Punë > > të Mirë</b> dhe <b>të Mirë për Punë</b>... > > Vizitoni: <a target="_blank" href="http://www.punaime.al/">www.punaime.al > </a></span></p> > > <p><a target="_blank" href="http://www.punaime.al/"><span > style="text-decoration: > > none;"><img width="165" height="31" border="0" alt="punaime" > > src="http://www.ikub.al/images/punaime.al_small.png" /></span></a></p> > > > > > > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

