Should be noted, that cranking the rank down to 20 produces a significantly smaller result.
On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote: > I'm running SVD as: > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir > /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 > --numCols 65458 --numRows 130103 > ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput > /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal > --maxError 0.1 --minEigenvalue 10.0 > > part-out.vec is 52 MB. The output from SVD (svdOut) is 104 MB and > largestCleanEigens is 88 MB. For some reason, this really doesn't feel right. > > Is there a guide on interpreting the output of SVD anywhere? Intuitively, I > believe the output should be a lot smaller? I mean that's the point, right? > > > I can share the vector if you want. > > -Grant > > -------------------------- > Grant Ingersoll > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > -------------------------- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
