Child heap size can be increased by passing command line options as well. See the example given below-
-Dmapred.map.child.java.opts=-Xmx6100m -Dmapred.reduce.child.java.opts=-Xmx6100m Thanks, Vinod http://blog.vinodsingh.com/ On Wed, Jun 6, 2012 at 3:20 PM, Sean Owen <[email protected]> wrote: > You need to increase the size of the children's heap. > mapred.child.java.opts can be set to -Xmx4g for example. This is > usually put in mapred-site.xml. > > Sampling does decrease the size of the intermediate outputs; probably > not the final output so much. But this is not your problem. You are > running out of heap on the workers. > > You should definitely use more than one reducer! It's really up to > you, says Hadoop, to specify this, use -Dmapred.reduce.tasks=10 or > whatever is appropriate. > > The name of the jobs kind of says what they do, and the javadoc says a > little more. If you have specific questions I bet people can explain > here. > > Sean > > > On Wed, Jun 6, 2012 at 7:39 AM, Something Something > <[email protected]> wrote: > > Hello, > > > > I am running this job with a file containing 791,732,411 lines. > > > > Step 1 (PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer) completed > in > > 3 minutes. > > > > Step 2 (PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer) took 2 > hours > > but completed successfully. It used only 1 Reducer so I am assuming the > > output is sorted, right? > > > > Step 3 (PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer) failed > > after running for 54 minutes with 'Error: Java heap space' error & it > was > > all downhill from there. > > > > > > Question: Are there any configuration parameters I can use to cut down > > size of output? I noticed this in ToItemVectorsMapper: > > > > public static final String SAMPLE_SIZE = ToItemVectorsMapper.class + > > ".sampleSize"; > > > > How do I cut down this sample size? > > > > Also, is there any documentation available that shows what each of these > > steps does? If not, I will just debug. Please let me know. Thanks. >
