On Jul 6, 2010, at 12:46 PM, Ted Dunning wrote: > Computing 1000 singular vectors is generally neither necessary nor helpful.
OK, good to know. This is my first time ever running SVD, so I have no clue what a useful number is for the rank value. Advice welcome here. Question: What exactly is the rank, anyway? It's the number of singular values, right? > After just a few dozen, the noise in the system dominates and you are > essentially just generating very fancy random numbers. Also, the total > memory required in the last steps of the SVD is proportional to either > number of columns or number of rows in your original matrix times the number > of singular vectors you are producing. > > Try scaling up the rank option from a small number first before blowing out > your memory requirements. OK, will do. > > On Tue, Jul 6, 2010 at 6:09 AM, Grant Ingersoll <[email protected]> wrote: > >> Anyone have guidelines on needed heap size when running SVD? I've done a >> couple of fairly long runs on my single machine and keep running out of mem. >> fairly deep into the run. Before I increase the heap size for the 4th time, >> I figured I'd see if it is even going to fit into memory at all. >> >> My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine. I'm >> running this locally for now as a first step in scaling it out. >> >> Here's my command: ./mahout svd >> -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir >> /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444 >> >> Thanks, >> Grant
