Thank you for your clarifications, now it is clear
2011/11/8 Jake Mannix <[email protected]>
> The output from the LanczosSolver is not the final set of results. The
> fact that you passed --cleansvd "true" to the system means that you want it
> to do some cleanup and remove any spurious singular vector/value pairs
> (like the zero-eigenvalue case here).
>
> Do you also see log output which looks like "appending {some vector info}
> to {some path}" on your console output? The vectors printed here should
> include their eigenvalue and relative approximation error.
>
> Your results also highlight a common fact about doing Lanczos on tiny
> matrices: Lanczos is iterative, and can only iterate up to the overal
> dimension of your input matrix, but only finds an approximation to the
> singular vectors / values which gets better as the iterations continue.
> This is why you are getting a very accurate measure of the top singular
> value, but progressively worse for lower ones.
>
> Try running this on a larger matrix (try 100 x 100), and look at, say, the
> top 50 singular vector/value pairs. Those should be significantly more
> accurate, and the only reason you would want to do a *distributed* SVD is
> if a) your data is HUGE, and b) you're only wanting to look at the top few
> (up to maybe hundreds) singular vector/value pairs. Point b) is a point of
> practicality if you have point a).
>
> -jake
>
> On Tue, Nov 8, 2011 at 8:56 AM, Ed Fine <[email protected]> wrote:
>
> > I am a Mahout newbie so please take this so I might be wrong, but I
> > strongly suspect it has to do with one of your Eigenvalues being 0. That
> > implies a singular matrix. You will see that your first two Eigenvalues
> are
> > equal to the singular values. Parsing the structure in smaller eiganvals
> > get numerically unstable in a near singular matrix. I bet that is your
> > issue. I think you can find a description of this issue in Numerical
> > Linear Algebra by trephethan (spelling?) and Bau.
> >
> > On Nov 8, 2011, at 4:11 AM, motta <[email protected]> wrote:
> >
> > > Hi everybody,
> > > I have completed my first Mahout experiment with an Hadoop local
> > > installation (single machine) and I obtained different results from
> > Scilab
> > > and the Mahout Distributed Lanczos Solver. Could someone explain why
> this
> > > happens? Am I doing something wrong?
> > >
> > > This is my matrix
> > > 2,0,8,6,0
> > > 1,6,0,1,7
> > > 5,0,7,4,0
> > > 7,0,8,5,0
> > > 0,10,0,0,7
> > >
> > > This is my Mahout invocation
> > > ./hadoop jar
> > >
> >
> /home/hadoop-user/mahout/mahout-distribution-0.5/mahout-examples-0.5-job.jar
> > > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
> --input
> > > /user/hadoop-user/mahout-input --output /user/hadoop-user/mahout-output
> > > --numCols 5 --numRows 5 --cleansvd "true" --rank 5
> > >
> > > These are the Mahout results
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: 4 passes through the
> > corpus so
> > > far...
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Lanczos iteration
> complete
> > -
> > > now to diagonalize the tri-diagonal auxiliary matrix.
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 0 found with
> > > eigenvalue 0.0
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 1 found with
> > > eigenvalue 1.0869992925693057
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 2 found with
> > > eigenvalue 3.4305998309907
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 3 found with
> > > eigenvalue 15.171371217397603
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 4 found with
> > > eigenvalue 17.918370809987454
> > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: LanczosSolver finished.
> > >
> > > And these are the results from Scilab (svd(X))
> > > -->[U,S,V]=svd(X);
> > > -->S
> > > S =
> > >
> > > 17.918371 0. 0. 0. 0.
> > > 0. 15.171372 0. 0. 0.
> > > 0. 0. 3.564002 0. 0.
> > > 0. 0. 0. 1.9842282 0.
> > > 0. 0. 0. 0. 0.3495557
> > >
> > > thank you,
> > > Alfredo
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Comparing-results-of-Mahout-SVD-and-Scilab-tp3490066p3490066.html
> > > Sent from the Mahout User List mailing list archive at Nabble.com.
> >
>