Re: DistributedRowMatrix.transpose Memory Woes

Jake Mannix Wed, 27 Apr 2011 12:57:41 -0700

There are two issues here - a) giving more memory to your reducers (have you
tried specifying -Dmapred.child.java.opts=-Xmx1024m (or something like that,
on the command line?), and b)
https://issues.apache.org/jira/browse/MAHOUT-639 which I should really have
gotten cleaned up and committed.


  -jake

On Wed, Apr 27, 2011 at 12:43 PM, Paul Mahon <[email protected]> wrote:

> I'm having trouble using Mahout's (0.4) DistributedRowMatrix transpose
> method. The matrix I'm transposing is about 12 million rows by 2.5 million
> columns. It's quite sparse (no more than 10 non-zero elements per row) so
> memory shouldn't be a problem. However, running transpose always runs out of
> memory in the reduce step:
>
> 2011-04-27 10:51:29,910 FATAL org.apache.hadoop.mapred.TaskTracker: Error
> running child : java.lang.OutOfMemoryError: Java heap space
>        at
> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:434)
>        at
> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387)
>        at
> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134)
>        at
> org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:142)
>        at
> org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:122)
>        at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> In digging into the problem I found out that the reduce task is being with
> with -Xmx=200m. That is the default hadoop mapred.child.java.opts, since I
> didn't override it in the mapred conf on the machine running the job. It
> should be possible to set parameters which are used by TransposeJob when
> called from the transpose method, but it seems there isn't.
>
> Did I miss some other way of transposing the matrix or some way to
> configure the transpose job?
>

Re: DistributedRowMatrix.transpose Memory Woes

Reply via email to