There are two issues here - a) giving more memory to your reducers (have you tried specifying -Dmapred.child.java.opts=-Xmx1024m (or something like that, on the command line?), and b) https://issues.apache.org/jira/browse/MAHOUT-639 which I should really have gotten cleaned up and committed.
-jake On Wed, Apr 27, 2011 at 12:43 PM, Paul Mahon <[email protected]> wrote: > I'm having trouble using Mahout's (0.4) DistributedRowMatrix transpose > method. The matrix I'm transposing is about 12 million rows by 2.5 million > columns. It's quite sparse (no more than 10 non-zero elements per row) so > memory shouldn't be a problem. However, running transpose always runs out of > memory in the reduce step: > > 2011-04-27 10:51:29,910 FATAL org.apache.hadoop.mapred.TaskTracker: Error > running child : java.lang.OutOfMemoryError: Java heap space > at > org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434) > at > org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387) > at > org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134) > at > org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:142) > at > org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:122) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > > In digging into the problem I found out that the reduce task is being with > with -Xmx=200m. That is the default hadoop mapred.child.java.opts, since I > didn't override it in the mapred conf on the machine running the job. It > should be possible to set parameters which are used by TransposeJob when > called from the transpose method, but it seems there isn't. > > Did I miss some other way of transposing the matrix or some way to > configure the transpose job? >
