It should also be noted, that MAHOUT-308 should lower this requirement by quite a bit.
-jake On Tue, Jul 6, 2010 at 11:23 PM, Jake Mannix <[email protected]> wrote: > In general, the current SVD impl requires, on the driving machine (ie not > on the HDFS cluster), at least 2 * rank * numCols * 8bytes. In your case, > this would be still a fairly modest value, like 62k * 16k = 1GB. > > -jake > > On Tue, Jul 6, 2010 at 3:09 PM, Grant Ingersoll <[email protected]>wrote: > >> Anyone have guidelines on needed heap size when running SVD? I've done a >> couple of fairly long runs on my single machine and keep running out of mem. >> fairly deep into the run. Before I increase the heap size for the 4th time, >> I figured I'd see if it is even going to fit into memory at all. >> >> My matrix is ~ 130,000 x 62,000 and I have 4GB total on my machine. I'm >> running this locally for now as a first step in scaling it out. >> >> Here's my command: ./mahout svd >> -Dmapred.input.dir=/tmp/solr-clust-n2/part-out.vec --numCols 61892 --tempDir >> /tmp/solr-clust-n2-svd --rank 1000 --numRows 129444 >> >> Thanks, >> Grant > > >
