Re: [Scikit-learn-general] Randomized PCA

Olivier Grisel Wed, 02 Nov 2011 15:05:27 -0700

2011/11/2 Radim Rehurek <[email protected]>:
> Hi guys,
>
>> Od: Olivier Grisel <[email protected]>
>> 2011/11/2 Stéfan van der Walt <[email protected]>:
>> > Hi all,
>> >
>> > Maybe this paper, from the current issue from SIAM Journal on
>> > Scientific Computing is of some interest:
>> >
>> > http://epubs.siam.org/sisc/resource/1/sjoce3/v33/i5/p2580_s1?view=print
>>
>> AFAIK, Radim Rehurek  in CC has already implemented this algorithm in
>> gensim. I will read the paper though. Thanks for the link.
>
>
> yes, I implemented a version of this algo that runs streamed (no random 
> access to observations) and in O(mk) memory -- unlike the original Halko et 
> al. that requires O((m+n)k).
>
> If you decide to implement the randomized PCA, I can offer some observations:
>
> 1. oversampling does little, accuracy comes mostly from the extra power 
> iteration steps
> 2. no power iterations result in miserable accuracy
> 3. extra power iteration steps quickly lead to numerical overflows; but QR is 
> pretty fast, so in gensim, I orthonormalize the intermediate matrices H after 
> each power iteration step. That's exactly the same method that remark 3.3 
> refers to.


Interesting. The current implementation in scikit-learn (which is
neither streamed nor parallel) does quite a bit of oversampling (if k
components are rrequired,  2 * k random vectors are used) and uses 3
power iterations by default but does not do qr inside the power
iteration steps:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L346
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/extmath.py#L126

I would be interesting to experiment with reducing the oversampling
and using otthonormalization after each power iteration.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Randomized PCA

Reply via email to