In general, the current SVD impl requires, on the driving machine (ie not on
the HDFS cluster), at least 2 * rank * numCols * 8bytes.  In your case, this
would be still a fairly modest value, like 62k * 16k = 1GB.

  -jake

On Tue, Jul 6, 2010 at 3:09 PM, Grant Ingersoll <[email protected]> wrote:

> Anyone have guidelines on needed heap size when running SVD?  I've done a
> couple of fairly long runs on my single machine and keep running out of mem.
> fairly deep into the run.  Before I increase the heap size for the 4th time,
> I figured I'd see if it is even going to fit into memory at all.
>
> My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine.  I'm
> running this locally for now as a first step in scaling it out.
>
> Here's my command:  ./mahout svd
> -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir
> /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444
>
> Thanks,
> Grant

Reply via email to