On Jul 6, 2010, at 12:46 PM, Ted Dunning wrote:

> Computing 1000 singular vectors is generally neither necessary nor helpful.

OK, good to know.  This is my first time ever running SVD, so I have no clue 
what a useful number is for the rank value.  Advice welcome here.  

Question: What exactly is the rank, anyway?  It's the number of singular 
values, right?

> After just a few dozen, the noise in the system dominates and you are
> essentially just generating very fancy random numbers.  Also, the total
> memory required in the last steps of the SVD is proportional to either
> number of columns or number of rows in your original matrix times the number
> of singular vectors you are producing.
> 
> Try scaling up the rank option from a small number first before blowing out
> your memory requirements.

OK, will do.

> 
> On Tue, Jul 6, 2010 at 6:09 AM, Grant Ingersoll <[email protected]> wrote:
> 
>> Anyone have guidelines on needed heap size when running SVD?  I've done a
>> couple of fairly long runs on my single machine and keep running out of mem.
>> fairly deep into the run.  Before I increase the heap size for the 4th time,
>> I figured I'd see if it is even going to fit into memory at all.
>> 
>> My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine.  I'm
>> running this locally for now as a first step in scaling it out.
>> 
>> Here's my command:  ./mahout svd
>> -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir
>> /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444
>> 
>> Thanks,
>> Grant


Reply via email to