2011/12/21 Radim Rehurek <[email protected]>

> > Od: Ted Dunning <[email protected]>
> > Předmět: Re: SVD in Mahout (was: Mahout Lanczos SVD complexity)
> > Datum: 19.12.2011 16:58:57
> > ----------------------------------------
> > The users of SVD on Mahout are more interested in their own application
> > level metrics than the metrics associated with the decomposition itself.
> >  Moreover, at the scale people have been working with, getting any
> > decomposition at all is refreshing.
>
>
> Heh. Is that the official Mahout philosophy for other algorithms as well?
> "Who cares about correctness, we are happy we can run it on some data at
> all, so shut up?" I hope you're not serious Ted.
>

Approximate algorithms are used all the time for all kinds of purposes.
 Stochastic gradient descent is an order-1 convergence and is rarely run to
the point where the model is at the precise optimum but instead is run to
the point that practical benefit is optimized.

Practical considerations have a major impact in large-scale data analysis.

Aren't you afraid people will draw wrong conclusions about an SVD
> application, using your (possibly wildly inaccurate) SVD implementation?
> Retract publications?
>

Radim, I think you are over-blowing the risk here.  I expect anybody using
any software to determine whether the level of accuracy provided is
sufficient to their needs.


> By all means, use whatever decomposition suits you. But SVD already has a
> well-established meaning in linear algebra and using that acronym comes
> with certain expectations.


And if you look at the recommendation literature you will find that the
meaning is considerably relaxed.


> People unfamiliar with the pitfalls of your implementation may assume
> they're really getting SVD (or at least a version that's "reasonably close"
> -- in the numerical computing sense). A big fat accuracy warning is in
> order here. Nobody expects more or less random vectors, even if these
> happen to perform better than the real truncated SVD in your app [citation
> needed].
>

Anybody who thinks that a "real SVD" produces anything but random numbers
for the more extreme singular vectors when applied to small count data
needs to have their head examined as well.


> > The examples that you gave in your thread involved walking *way* down the
> > spectrum to the smaller singular values.  That is absolutely not the
> > interest with most Mahout users because that would involve fairly massive
> > over-fitting.
>
> Too many opinions, too little data. Instead, I decided to run the English
> wikipedia experiments with factors=10 and oversampling=5, as per your
> concerns.
>

Too much over-heated rhetoric as well.  I wouldn't mind if you tone it down
a bit.


> (cross-posting to the gensim mailing list, as this might be of interest to
> gensim users as well)
>
> Data: English Wikipedia as term-document matrix (0.42B non-zeroes, 3.5M
> documents, 10K features).
> Requesting top 10 factors (1% of the total singular value mass), not 500
> factors like before (15% total mass). Accuracy is evaluated by comparing
> reconstruction error to Lapack's DSYEV in-core routine on A*A^T. Error =
> |A*A^T-U*S^2*U^T| / |A*A^T-U_lapack*Sigma_lapack*U_lapack^T|.
>
> batch algo    error
> --------------+------
> baseline*      1.986
> 0 power iters  1.877
> 1 power iter   1.094
> 2 power iters  1.009
> 4 power iters  1.0005
> 6 power iters  1.00009
>
> The results are completely in line with Martinsson et al.'s [1] as well as
> my previous experiments: no power iteration steps with massive truncation =
> rubbish output. Accuracy improves exponentially with increasing no. of
> iteration steps (but see my initial warning re. numerical issues with
> higher number of steps if implemented naively).
>
> So, your worry that the SVD inaccuracy is somehow due to asking too many
> factors and irrelevant for thinner SVDs is without substance.


Radim, you might go back to my original comment.  You are inventing a
statement here.

My original comment was that it is not well known whether increasing p
would compensate for lack of a power iteration.  You are using a smaller
value of p than the default and have not examined the question that I posed.

I never said that a power iteration was a bad idea.


> ...
> From all the dev replies here -- no users actually replied -- I get the
> vibe that the accuracy discussion annoys you.


No.  It doesn't.

But I think that it can be done in a less confrontational way.


> Now, I dropped by to give a friendly hint about possible serious accuracy
> concerns, based on experience with mid-scale (billions of non-zeroes) SVD
> computations in one specific domain (term-document matrices in NLP). And
> possibly learning about your issues on tera-feature scale datasets in
> return, which I'm very interested in. Apparently neither of us is getting
> anything out of this, so I'll stop here.
>

As you like.

Reply via email to