Also there are a few random projection single machine implementations about
to be committed.  These will allow a middle ground for scalability.

As Dmitriy pointed out with p = 0 and k = full rank, these should work on
any size matrix.  That isn't very interesting of course since they devolve
down to doing an in-memory SVD of full size in that case.

When computing less than a full SVD, the approximation is much better with
higher dimensions.

On Sat, Sep 24, 2011 at 2:51 PM, Lance Norskog <goks...@gmail.com> wrote:

> As a side note, there are also a few in-memory SVD implementations.  There
> is a SingularValueDecomposition which uses "pre-Mahout" data structures.
> There are also a few Factorizer classes which are apparently SVD but only
> supply right&left matrices but no singular values.
>
> What are the minimum sizes expected to "work" in these algorithms? Are they
> intended to be canonical implementations that are correct from "2x2" to
> "out
> of memory" or "numerical instability"?
>
> Lance
>
> On Fri, Sep 23, 2011 at 6:34 PM, Dan Brickley <dan...@danbri.org> wrote:
>
> > On 23 September 2011 16:03, Lance Norskog <goks...@gmail.com> wrote:
> > > Markus-
> > >
> > > Probably the best approach is to crosscheck your results with live data
> > of
> > > various sizes with the R statistical system.  (You will often get
> results
> > > with opposing signs.)
> >
> > So, that's exactly where I was, with Ruby and Matlab(<cheapskate>GNU
> > Octave</cheapskate>) taking the place of R there.
> >
> > It didn't help me that my grasp of the relevant linear algebra was
> > somewhat journalistic, for sure.  But precisely because it was shaky,
> > I thought "right, let's stay sane since I'm not an expert either in
> > the maths, or in hadoop, or in mahout, so ... I'll take a simple tiny
> > testcase example, make sure I can run it in Octave and Ruby, ... and
> > use that to build out my understanding of Mahout's SVD".
> >
> > That turned out to be a disappointing learning experience, for reasons
> > recently summarised here.  I was using a tiny example taken from
> > http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/
> > because I thought that was a nice way of re-using a helpful writeup as
> > Mahout documentation. Bad idea due to dataset size.
> >
> > Looking again at
> > https://cwiki.apache.org/MAHOUT/dimensional-reduction.html I see that
> > there is in fact a good sample dataset now; the mailing list stuff.
> > Maybe I'd missed it at the time. It deserves more attention, as a
> > common hub for documentation, user education, and for comparison
> > testing and sanity-checking against non-Mahout environments like R
> > etc. (Perhaps the EC2 aspect is an issue for non-Amazon users?). I'm
> > not sure if  "Overall, there are 6,094,444 key-value pairs in 283
> > files taking around 5.7GB of disk." makes it too big for many
> > non-Mahout environments. But the sooner there's a single dataset
> > people use to get started experimenting with Mahout SVD, the sooner
> > we'll avoid everyone revisiting the "I don't understand what Lanczos
> > has done..." thread.
> >
> > Should there be a FAQ on the Lanczos page?
> >
> > Q: Will this work with a test matrix of e.g. 5x8 size?
> > A: No, ... it needs to be substantially bigger,...
> >
> > Q: How much bigger?
> > A: <... somebody write something here ... >
> >
> > cheers,
> >
> > Dan
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Reply via email to