In general, the current SVD impl requires, on the driving machine (ie not on the HDFS cluster), at least 2 * rank * numCols * 8bytes. In your case, this would be still a fairly modest value, like 62k * 16k = 1GB.
-jake On Tue, Jul 6, 2010 at 3:09 PM, Grant Ingersoll <[email protected]> wrote: > Anyone have guidelines on needed heap size when running SVD? I've done a > couple of fairly long runs on my single machine and keep running out of mem. > fairly deep into the run. Before I increase the heap size for the 4th time, > I figured I'd see if it is even going to fit into memory at all. > > My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine. I'm > running this locally for now as a first step in scaling it out. > > Here's my command: ./mahout svd > -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir > /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444 > > Thanks, > Grant
