Hi All,

While reading input from directory recursively consisting of files of size 
30Mb, using WholeFileInputFormat and WholeFileRecordReader, I am running into 
JavaHeapSize error for even a very small file of 30MB. By default the 
mapred.child.java.opts is set to -Xmx200m and should be sufficient enough to 
run atleast 30MB files present in the directory. 

The input is a normal random words in file. Each Map is given a single file of 
size 30MB and I am reading value as the content of the whole file. And running 
normal word count.

If I increase the mapred.child.java.opts size to higher value the applications 
runs successfully. But it would be great if anyone can suggest me why 
mapred.child.java.opts  which is currently 200Mb default for task is not 
sufficient for 30 MB file, as this means Hadoop MapReduce is consuming a lot of 
heap size and out of 200MB it doesn't even use 30Mb to process the task? Also, 
is there any other way to read the a large Whole file as a input to a single 
Map, meaning every Map gets a whole file to process?

-Shubh 

Reply via email to