It doesn't need 20 GB memory. Reducer doesn't load all data into memory at once, instead is would use the disk, since it does "merge sort".
2014-04-03 8:04 GMT+08:00 Li Li <[email protected]>: > I have a map reduce program that do some matrix operations. in the > reducer, it will average many large matrix(each matrix takes up > 400+MB(said by Map output bytes). so if there 50 matrix to a reducer, > then the total memory usage is 20GB. so the reduce task got exception: > > FATAL org.apache.hadoop.mapred.Child: Error running child : > java.lang.OutOfMemoryError: Java heap space > at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344) > at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406) > at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238) > at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438) > at org.apache.hadoop.mapred.Merger.merge(Merger.java:142) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539) > at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > one method I can come up with is use Combiner to save sums of some > matrixs and their count > but it still can solve the problem because the combiner is not fully > controled by me. >
