Should be noted, that cranking the rank down to 20 produces a significantly 
smaller result.


On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote:

> I'm running SVD as:
> ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir 
> /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 
> --numCols 65458 --numRows  130103
>  ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput 
> /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal 
> --maxError 0.1 --minEigenvalue 10.0
> 
> part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and 
> largestCleanEigens is 88 MB.  For some reason, this really doesn't feel right.
> 
> Is there a guide on interpreting the output of SVD anywhere?  Intuitively, I 
> believe the output should be a lot smaller?   I mean that's the point, right? 
>  
> 
> I can share the vector if you want.
> 
> -Grant
> 
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
> 

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Reply via email to