Use the hadoop.tmp.dir setting in nutch-site.conf to point to a disk where 
plenty is space is available. 

> Other users have previously reported similar problems which were due to a
> lack on space on disk as suggested by this
> 
> *Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
> notfind any valid local directory for
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/sp
> i ll0.out*
> 
> Make sure that the temporary directory used by Hadoop is on a partition
> with enough space
> 
> HTH
> 
> Julien
> 
> On 4 January 2011 18:19, <alx...@aim.com> wrote:
> > Which command did you use? Merging segments is very expensive in
> > resources, so I try to avoid merging them.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Marseld Dedgjonaj <marseld.dedgjo...@ikubinfo.com>
> > To: user <user@nutch.apache.org>
> > Sent: Tue, Jan 4, 2011 7:12 am
> > Subject: FW: Exception on segment merging
> > 
> > 
> > I see in hadup log and some more details about the exception are there.
> > 
> > Please help me what to check for this error.
> > 
> > 
> > 
> > Here are the details:
> > 
> > 
> > 
> > 2011-01-04 07:40:23,999 INFO  segment.SegmentMerger - Slice size: 50000
> > 
> > URLs.
> > 
> > 2011-01-04 07:40:36,563 INFO  segment.SegmentMerger - Slice size: 50000
> > 
> > URLs.
> > 
> > 2011-01-04 07:40:36,563 INFO  segment.SegmentMerger - Slice size: 50000
> > 
> > URLs.
> > 
> > 2011-01-04 07:40:43,685 INFO  segment.SegmentMerger - Slice size: 50000
> > 
> > URLs.
> > 
> > 2011-01-04 07:40:43,686 INFO  segment.SegmentMerger - Slice size: 50000
> > 
> > URLs.
> > 
> > 2011-01-04 07:40:47,316 WARN  mapred.LocalJobRunner - job_local_0001
> > 
> > java.io.IOException: Spill failed
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.jav
> > a:1
> > 
> > 044)
> > 
> >        at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >        
> >        at org.apache.hadoop.io.Text.writeString(Text.java:412)
> >        
> >        at org.apache.nutch.metadata.Metadata.write(Metadata.java:220)
> >        
> >        at org.apache.nutch.protocol.Content.write(Content.java:170)
> >        
> >        at
> > 
> > org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135)
> > 
> >        at
> >        org.apache.nutch.metadata.MetaWrapper.write(MetaWrapper.java:107)
> >        
> >        at
> > 
> > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.
> > ser
> > 
> > ialize(WritableSerialization.java:90)
> > 
> >        at
> > 
> > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.
> > ser
> > 
> > ialize(WritableSerialization.java:77)
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:900
> > )
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:
> > 466
> > 
> > )
> > 
> >        at
> > 
> > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:361)
> > 
> >        at
> > 
> > org.apache.nutch.segment.SegmentMerger.map(SegmentMerger.java:113)
> > 
> >        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >        
> >        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >        
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >        
> >        at
> > 
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > 
> > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
> > not
> > 
> > find any valid local directory for
> > 
> > 
> > taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000032_0/output/
> > spi
> > 
> > ll0.out
> > 
> >        at
> > 
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFo
> > rWr
> > 
> > ite(LocalDirAllocator.java:343)
> > 
> >        at
> > 
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloc
> > ato
> > 
> > r.java:124)
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile
> > .ja
> > 
> > va:107)
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.jav
> > a:1
> > 
> > 221)
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java
> > :68
> > 
> > 6)
> > 
> >        at
> > 
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.
> > jav
> > 
> > a:1173)
> > 
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > 
> > From: Marseld Dedgjonaj [mailto:marseld.dedgjo...@ikubinfo.com]
> > 
> > Sent: Tuesday, January 04, 2011 1:28 PM
> > 
> > To: user@nutch.apache.org
> > 
> > Subject: Exception on segment merging
> > 
> > 
> > 
> > Hello everybody,
> > 
> > 
> > 
> > I have configured nutch-1.2 to crawl all urls of a specific website.
> > 
> > 
> > 
> > It runs fine for a while but now that the number of indexed urls has
> > grown
> > 
> > more than 30'000,  I got an exception on segment merging.
> > 
> > 
> > 
> > Have anybody seen this kind of error.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > The exception is shown below.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Slice size: 50000 URLs.
> > 
> > 
> > 
> > 
> > 
> > Slice size: 50000 URLs.
> > 
> > 
> > 
> > 
> > 
> > Slice size: 50000 URLs.
> > 
> > 
> > 
> > 
> > 
> > Slice size: 50000 URLs.
> > 
> > 
> > 
> > 
> > 
> > Slice size: 50000 URLs.
> > 
> > 
> > 
> > 
> > 
> > Exception in thread "main" java.io.IOException: Job failed!
> > 
> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> >        
> >        
> >        
> >        
> >        
> >        at
> > 
> > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:638)
> > 
> >        at
> > 
> > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:683)
> > 
> > 
> > 
> > 
> > 
> > Merge Segments-  End at:   04-01-2011 07:40:48
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > Thanks in advance & Best Regards,
> > 
> > 
> > 
> > Marseldi
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni
> > <b>Pun&euml;
> > 
> > t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>...
> > 
> > Vizitoni: <a target="_blank" href="http://www.punaime.al/";>www.punaime.al
> > </a></span></p>
> > 
> > <p><a target="_blank" href="http://www.punaime.al/";><span
> > style="text-decoration:
> > 
> > none;"><img width="165" height="31" border="0" alt="punaime"
> > 
> > src="http://www.ikub.al/images/punaime.al_small.png"; /></span></a></p>

Reply via email to