Computing 1000 singular vectors is generally neither necessary nor helpful. After just a few dozen, the noise in the system dominates and you are essentially just generating very fancy random numbers. Also, the total memory required in the last steps of the SVD is proportional to either number of columns or number of rows in your original matrix times the number of singular vectors you are producing.
Try scaling up the rank option from a small number first before blowing out your memory requirements. On Tue, Jul 6, 2010 at 6:09 AM, Grant Ingersoll <[email protected]> wrote: > Anyone have guidelines on needed heap size when running SVD? I've done a > couple of fairly long runs on my single machine and keep running out of mem. > fairly deep into the run. Before I increase the heap size for the 4th time, > I figured I'd see if it is even going to fit into memory at all. > > My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine. I'm > running this locally for now as a first step in scaling it out. > > Here's my command: ./mahout svd > -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir > /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444 > > Thanks, > Grant
