Running out of disk space during segment merger

Yves Petinot Thu, 25 Mar 2010 12:01:48 -0700

Hi,

I was wondering if some people on the list have been facing issues withthe segment merge phase, with all the nodes on their hadoop clustereventually running out of disk space ? The type errors i'm getting lookslike this:


java.io.IOException: Task: attempt_201003241258_0005_r_000001_0 - The reduce 
copier failed
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:375)
        at org.apache.hadoop.mapred.Child.main(Child.java:158)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not 
find any valid local directory for 
file:/home/snoothbot/nutch/hadoop_tmp/mapred/local/taskTracker/jobcache/job_201003241258_0005/attempt_201003241258_0005_r_000001_0/output/map_115.out
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2384)

FSError: java.io.IOException: No space left on device

To give a little background, I'm currrently running this on a 3 nodecluster, each node having 500GB drives, which are mostly empty at thebeginning of the process (~400 GB available on each node). Thereplication factor is set to 2 and i did also enable Hadoop blockcompression. Now, the nutch crawl takes up around 20 GB of disk (with 7segments to merge, one of them being 9 GB, the others ranging from 1 to3 GB in size), so intuitively there should be plenty of space availablefor the merge operation, but we still end up running out of space duringthe reduce phase (7 reduce tasks). I'm currently trying to increase thenumber of reduce tasks to limit the resource/disk consumption of anygiven task, but i'm wondering if someone has experienced this type ofissue before and whether there is a better way of approaching it ? Forinstance would using the multiple output segments option useful indecreasing the amount of temp disk space needed at any given time ?


many thanks in advance,

-yp

Running out of disk space during segment merger

Reply via email to