Re: [Scikit-learn-general] PCA nipals and SparsePCA

bthirion Sat, 31 May 2014 13:22:22 -0700

    Dear Luca,

If I understand correctly, your approach is deflationary PCA that usesthe l1 prox to enforce sparsity.I am not sure how this compares to the lars-based implementation of thescikit (the non-convexity of the problem makes it hard to comparealgorithms).Moreover, I have run your benchmark on a large, yet arguably smalldataset, with shape = (10000,200).

Two things happen:

- your implementation did not converge with the number of predefinediterations

- it is slower than sparsePCA.


In [4]: %run pca.py
start
No Convergence. Error!!!

('time iterative SPCA=', 104.21303296089172, 'time SparsePCA=',46.90740513801575)

finish!

I'm not sure what to conclude at the moment, but since there is a hugeamount of work to reach skl's standard in terms of quality, I wouldfocus first on the motivation for such an implementation:In the relevant parameter space (n_samples, n_features, n_components)what is the region in which the sparse nipals is expected to performbetter than lasso-based SparsePCA ? Note that there are probablypublications on that topic.

Best,

Bertrand


On 31/05/2014 01:03, Luca Puggini wrote:

Hi Bertrand,

I am not familiar with RandomizedPCA so I do not known if nipals isfaster than RandomizedPCA. It is for sure faster than SVD when we areinterested only in few components. My impression is thatRandomizedPCA is an approximation of PCA while nipals should be analgorithm that theoretically converges to the same result of SVD.

Regarding SparsePCA I don't think that there is an equivalent ofRandomizedPCA in sklearn. Here http://justpaste.it/fobq I haveimplemented the algorithm described in "Sparse PCA through Low-rankApproximations". In the linked file is included a speed comparisonwith SparsePCA. The value of the penalty alpha seems to have aslightly different effect but the overall result is almost the same.


Let me know about that.
Best,
Luca




     Dear Luca,


          In terms of efficiency, do you think that nipals outperforms the
    RandomizedPCA ? I'm not an expert in these methods, but it sounds like
    they rely on similar tricks.
          My suggestion would be to run a benchmark on some dataset of the
    scikit to compare the accurcay/computation time tradeoffs.
          Best,

    Bertrand

    On 28/05/2014 18:45, Luca Puggini wrote:
    > Hi,
    > I was looking to the PCA and SparsePCA implementation of sklearn.
    > They are both based on SVD but I think that the nipals
    implementation
    > of the same algorithm can really increase the speed in some
    situations.
    >
    > In particular with sparse PCA we usually use a small number of
    > components and so its speed can be increased using nipals to compute
    > the initial value of  u,v (in the class dictionary learning).
    >
    > Always on this road there is a nipals like algorithm for Sparse
    PCA. I
    > have already written a python implementation for this and should not
    > be  a problem for me to integrate it with sklearn.
    >
    > Is this considered useful for the community or is this off topic?
    > Are there others people already working on it?
    >
    > Thanks,
    > Luca
    >




------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] PCA nipals and SparsePCA

Reply via email to