Re: [Scikit-learn-general] Robust PCA

Andreas Mueller Wed, 15 Apr 2015 10:06:30 -0700

Hi Alex.

Thanks for that :) It would be great if you could publish your versionto github.

We probably can't use PyPROPACK in scikit-learn.

The GSoC application period is just over, so you'd have to wait tillnext year to do that.


Cheers,
Andy

On 04/15/2015 12:53 PM, Alex wrote:

Hi Andreas,
I have an implementation of the ALM method for Robust PCA from Candesusing Jake Vanderplas' PyPROPACK. It's in a private bitbucket repobut I will move it to github and send the link if you like. Iactually really wanted to contribute RPCA to sklearn.
I don't know about a PR but I found a while back someone wishing toadd RPCA and maybe doing GSoC for it.
Also, there's a slight variant on Robust PCA that apparently is morescalable. The paper is here:
http://www.icml-2011.org/papers/41_icmlpaper.pdf
I intend to explore some of the different methods for the low rankplus sparse problem.
Alex (no longer a lurker, it seems)
------------------------------------------------------------------------
From: scikit-learn-general-requ...@lists.sourceforge.net<mailto:scikit-learn-general-requ...@lists.sourceforge.net>
Sent: ‎4/‎15/‎2015 8:24 AM
To: scikit-learn-general@lists.sourceforge.net<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Scikit-learn-general Digest, Vol 63, Issue 28

Send Scikit-learn-general mailing list submissions to
scikit-learn-general@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
scikit-learn-general-requ...@lists.sourceforge.net

You can reach the person managing the list at
scikit-learn-general-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."


Today's Topics:

   1. Re: pydata (Andreas Mueller)
   2. Robust PCA (Andreas Mueller)
   3. Re: Robust PCA (Kyle Kastner)
   4. Performance of LSHForest (Miroslav Batchkarov)
   5. Re: Contributing to scikit-learn with a re-implementation of
      a Random Forest based iterative feature selection method
      (Andreas Mueller)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Apr 2015 10:15:56 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: Re: [Scikit-learn-general] pydata
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e729c.1080...@gmail.com>
Content-Type: text/plain; charset="windows-1252"

PyData London is soon, not sure the date is official. It's end of June,
I think.

In NYC I think I'm talking at a Python meetup at April 23rd.


On 04/14/2015 06:05 PM, Pagliari, Roberto wrote:
> Is there a pydata or sklearn workshop coming up in NYC or London?
>
> Thank you,
>
>
>------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through liveexercises> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Wed, 15 Apr 2015 10:33:59 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: [Scikit-learn-general] Robust PCA
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e76d7.3000...@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Hey all.
Was there some plan to add Robust PCA at some point? I vaguely remember
a PR, but maybe I'm making things up.
It sounds like a pretty cool model and is widely used:
Sparse
http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

[and I was just promised a good implementation]

Andy



------------------------------

Message: 3
Date: Wed, 15 Apr 2015 11:04:21 -0400
From: Kyle Kastner <kastnerk...@gmail.com>
Subject: Re: [Scikit-learn-general] Robust PCA
To: scikit-learn-general@lists.sourceforge.net
Message-ID:
<CAGNZ19C-_70uNq49_T+Rmey6=0dsh1sbrqvej2eypcepp4d...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Robust PCA is awesome - I would definitely like to see a good and fast
version. I had a version once upon a time, but it was neither good
*or* fast :)
On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller <t3k...@gmail.com>wrote:
> Hey all.
> Was there some plan to add Robust PCA at some point? I vaguely remember
> a PR, but maybe I'm making things up.
> It sounds like a pretty cool model and is widely used:
> Sparse
> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
>
> [and I was just promised a good implementation]
>
> Andy
>
>------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through liveexercises> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------

Message: 4
Date: Wed, 15 Apr 2015 16:12:26 +0100
From: Miroslav Batchkarov <mbatchka...@gmail.com>
Subject: [Scikit-learn-general] Performance of LSHForest
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <640c3bb8-ae05-402e-9d44-f96fd2488...@gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi everyone,
was really impressed by the speedups provided by LSHForest compared tobrute-force search. Out of curiosity, I compared LSRForest to theexisting ball tree implementation. The approximate algorithm isconsistently slower (see below). Is this normal and should it bementioned in the documentation? Does approximate search offer anybenefits in terms of memory usage?
I ran the same example<http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>with a algorithm=ball_tree. I also had to set metric=?euclidean? (thismay affect results). The output is:
Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy:1.00 +/-0.00Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy:0.94 +/-0.05Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy:0.92 +/-0.07Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3,accuracy: 0.92 +/-0.07Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5,accuracy: 0.84 +/-0.10Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5,accuracy: 0.80 +/-0.06
With n_candidates=100, the output is
Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy:1.00 +/-0.00Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy:0.94 +/-0.05Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy:0.92 +/-0.07Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4,accuracy: 0.90 +/-0.11Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7,accuracy: 0.82 +/-0.13Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6,accuracy: 0.78 +/-0.04
---
Miroslav Batchkarov
PhD Student,
Text Analysis Group,
Department of Informatics,
University of Sussex



-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 5
Date: Wed, 15 Apr 2015 11:23:32 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn with
a re-implementation of a Random Forest based iterative feature
selection method
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e8274.8080...@gmail.com>
Content-Type: text/plain; charset="windows-1252"

Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to
RFE(RandomForestClassifier()).
Is it qualitatively different from that? Does it use a different feature
importance?

btw: your mail is flagged as spam as your link is broken and links to
some imperial college internal page.

Cheers,
Andy

On 04/15/2015 05:03 AM, Daniel Homola wrote:
> Hi all,
>
> I needed a multivariate feature selection method for my work. As I'm
> working with biological/medical data, where n < p or even n << p I
> started to read up on Random Foretst based methods, as in my limited
> understanding RF copes pretty well with this suboptimal situation.
>
> I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/
><https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
>
> After reading the paper and checking some of the pretty impressive
> citations I thought I'd try it, but it was really slow. So I thought
> I'll reimplement it in Python, because I hoped (based on
>thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn><https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
> that it will be faster. And it is :) I mean a LOT faster..
>
> I was wondering if this would be something that you would consider
> incorporating into the feature selection module of scikit-learn?
>
> If yes, do you have a tutorial or some sort of guidance about how
> should I prepare the code, what conventions should I follow, etc?
>
> Cheers,
>
> Daniel Homola
>
> STRATiGRAD PhD Programme
> Imperial College London
>
>
>------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through liveexercises> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through liveexerciseshttp://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF

------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


End of Scikit-learn-general Digest, Vol 63, Issue 28
****************************************************


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Robust PCA

Reply via email to