Even though the SVD is supposed to reduce dimensionality it does not means that your results will have smaller size [in terms of memory], since U , S and V are dense matrices. except if you are using too few eigenvectors. Your input matrix is a sparse, had it been represented as a dense matrix it would have far large size.
On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <[email protected]>wrote: > Should be noted, that cranking the rank down to 20 produces a significantly > smaller result. > > > On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote: > > > I'm running SVD as: > > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir > /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 > --numCols 65458 --numRows 130103 > > ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput > /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal > --maxError 0.1 --minEigenvalue 10.0 > > > > part-out.vec is 52 MB. The output from SVD (svdOut) is 104 MB and > largestCleanEigens is 88 MB. For some reason, this really doesn't feel > right. > > > > Is there a guide on interpreting the output of SVD anywhere? > Intuitively, I believe the output should be a lot smaller? I mean that's > the point, right? > > > > I can share the vector if you want. > > > > -Grant > > > > -------------------------- > > Grant Ingersoll > > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > > > > -------------------------- > Grant Ingersoll > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8 > > -- Akshay Uday Bhat. Graduate Student, Computer Science, Cornell University Website: http://www.akshaybhat.com
