Re: SVD Memory Reqs

Ted Dunning Tue, 06 Jul 2010 09:48:21 -0700

Computing 1000 singular vectors is generally neither necessary nor helpful.
 After just a few dozen, the noise in the system dominates and you are
essentially just generating very fancy random numbers.  Also, the total
memory required in the last steps of the SVD is proportional to either
number of columns or number of rows in your original matrix times the number
of singular vectors you are producing.


Try scaling up the rank option from a small number first before blowing out
your memory requirements.

On Tue, Jul 6, 2010 at 6:09 AM, Grant Ingersoll <[email protected]> wrote:

> Anyone have guidelines on needed heap size when running SVD?  I've done a
> couple of fairly long runs on my single machine and keep running out of mem.
> fairly deep into the run.  Before I increase the heap size for the 4th time,
> I figured I'd see if it is even going to fit into memory at all.
>
> My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine.  I'm
> running this locally for now as a first step in scaling it out.
>
> Here's my command:  ./mahout svd
> -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir
> /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444
>
> Thanks,
> Grant

Re: SVD Memory Reqs

Reply via email to