There are several issues could come together, since you know your data, we can
only guess here:
1) mapred.child.java.opts=-Xmx2g setting only works IF you didn't set
"mapred.map.child.java.opts" or "mapred.reduce.child.java.opts", otherwise, the
later one will override the "mapred.child.java.opts". So double check the
setting, make sure the reducers did have 2G heap as your want.
2) In your implementation, you Could OOM as you store more and more data into
"TrainingWeights result". So the question is for each "Reducer group", or
"Key", how many data it could be?If a key could contain big values, then all
these values will be saved in the memory of "result" instance. That will
require big memory. If so, either you have to have that much memory, or
redesign your key, make it more lower level, so requires less memory.
Yong
Date: Thu, 3 Apr 2014 17:53:57 +0800
Subject: Re: how to solve reducer memory problem?
From: [email protected]
To: [email protected]
you can think of each TrainingWeights as a very large
double[] whose length is about 10,000,000 TrainingWeights
result=null;
int total=0; for(TrainingWeights
weights:values){ if(result==null){
result=weights;
}else{ addWeights(result, weights);
} total++;
} if(total>1){
divideWeights(result, total);
} context.write(NullWritable.get(), result);
On Thu, Apr 3, 2014 at 5:49 PM, Gordon Wang <[email protected]> wrote:
What is the work in reducer ?Do you have any memory intensive work in
reducer(eg. cache a lot of data in memory) ? I guess the OOM error comes from
your code in reducer.
On Thu, Apr 3, 2014 at 5:10 PM, Li Li <[email protected]> wrote:
mapred.child.java.opts=-Xmx2g
On Thu, Apr 3, 2014 at 5:10 PM, Li Li <[email protected]> wrote:
2g
On Thu, Apr 3, 2014 at 1:30 PM, Stanley Shi <[email protected]> wrote:
This doesn't seem like related with the data size.
How much memory do you use for the reducer?
Regards,
Stanley Shi,
On Thu, Apr 3, 2014 at 8:04 AM, Li Li <[email protected]> wrote:
I have a map reduce program that do some matrix operations. in the
reducer, it will average many large matrix(each matrix takes up
400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
then the total memory usage is 20GB. so the reduce task got exception:
FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
one method I can come up with is use Combiner to save sums of some
matrixs and their count
but it still can solve the problem because the combiner is not fully
controled by me.
--
RegardsGordon Wang