Hi All,

I am trying to create a single segment by merging several segments generated 
during one cycle. I am using SegmentMerger utility provided out of box in nutch 
with number of slices=1.
Below are the configuration as per my setup environment :

Hadoop cluster (v 0.20) : 10 node cluster
Cluster Node configuration detail : Quad core cpu with 8 GB RAM.
Memory allotted to child processes as well as hadoop daemons = 2 GB
Number of input segments = 25
Total input size  ~ 286 GB
Total number of maps = 12371
Total number of reduces = 10
Replication factor = 1

I have observed that mappers gets executed successfully to completion. After 
which I am getting below error while copy/merge phase which causes job failure.

Error: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)
        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:404)
        at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)
        at 
org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
        at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:2454)

I have tried running same process with 4 GB memory allocation for child 
processes but no luck.
I have also tried setting io.sort.factor = 3 in mapred-site.xml, but still 
issue is the same.

Any pointers for resolving this issue will help a lot.

Thanks for the help in advance,
Vishal.




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Reply via email to