Hi Alex.
Thanks for that :) It would be great if you could publish your version to github.
We probably can't use PyPROPACK in scikit-learn.
The GSoC application period is just over, so you'd have to wait till next year to do that.

Cheers,
Andy

On 04/15/2015 12:53 PM, Alex wrote:
Hi Andreas,
I have an implementation of the ALM method for Robust PCA from Candes using Jake Vanderplas' PyPROPACK. It's in a private bitbucket repo but I will move it to github and send the link if you like. I actually really wanted to contribute RPCA to sklearn.

I don't know about a PR but I found a while back someone wishing to add RPCA and maybe doing GSoC for it.

Also, there's a slight variant on Robust PCA that apparently is more scalable. The paper is here:
http://www.icml-2011.org/papers/41_icmlpaper.pdf

I intend to explore some of the different methods for the low rank plus sparse problem.

Alex (no longer a lurker, it seems)
------------------------------------------------------------------------
From: [email protected] <mailto:[email protected]>
Sent: ‎4/‎15/‎2015 8:24 AM
To: [email protected] <mailto:[email protected]>
Subject: Scikit-learn-general Digest, Vol 63, Issue 28

Send Scikit-learn-general mailing list submissions to
[email protected]

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
[email protected]

You can reach the person managing the list at
[email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."


Today's Topics:

   1. Re: pydata (Andreas Mueller)
   2. Robust PCA (Andreas Mueller)
   3. Re: Robust PCA (Kyle Kastner)
   4. Performance of LSHForest (Miroslav Batchkarov)
   5. Re: Contributing to scikit-learn with a re-implementation of
      a Random Forest based iterative feature selection method
      (Andreas Mueller)


----------------------------------------------------------------------

Message: 1
Date: Wed, 15 Apr 2015 10:15:56 -0400
From: Andreas Mueller <[email protected]>
Subject: Re: [Scikit-learn-general] pydata
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="windows-1252"

PyData London is soon, not sure the date is official. It's end of June,
I think.

In NYC I think I'm talking at a Python meetup at April 23rd.


On 04/14/2015 06:05 PM, Pagliari, Roberto wrote:
> Is there a pydata or sklearn workshop coming up in NYC or London?
>
> Thank you,
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Wed, 15 Apr 2015 10:33:59 -0400
From: Andreas Mueller <[email protected]>
Subject: [Scikit-learn-general] Robust PCA
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hey all.
Was there some plan to add Robust PCA at some point? I vaguely remember
a PR, but maybe I'm making things up.
It sounds like a pretty cool model and is widely used:
Sparse
http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf

[and I was just promised a good implementation]

Andy



------------------------------

Message: 3
Date: Wed, 15 Apr 2015 11:04:21 -0400
From: Kyle Kastner <[email protected]>
Subject: Re: [Scikit-learn-general] Robust PCA
To: [email protected]
Message-ID:
<CAGNZ19C-_70uNq49_T+Rmey6=0dsh1sbrqvej2eypcepp4d...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Robust PCA is awesome - I would definitely like to see a good and fast
version. I had a version once upon a time, but it was neither good
*or* fast :)

On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller <[email protected]> wrote:
> Hey all.
> Was there some plan to add Robust PCA at some point? I vaguely remember
> a PR, but maybe I'm making things up.
> It sounds like a pretty cool model and is widely used:
> Sparse
> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
>
> [and I was just promised a good implementation]
>
> Andy
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------

Message: 4
Date: Wed, 15 Apr 2015 16:12:26 +0100
From: Miroslav Batchkarov <[email protected]>
Subject: [Scikit-learn-general] Performance of LSHForest
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"

Hi everyone,

was really impressed by the speedups provided by LSHForest compared to brute-force search. Out of curiosity, I compared LSRForest to the existing ball tree implementation. The approximate algorithm is consistently slower (see below). Is this normal and should it be mentioned in the documentation? Does approximate search offer any benefits in terms of memory usage?


I ran the same example <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py> with a algorithm=ball_tree. I also had to set metric=?euclidean? (this may affect results). The output is:

Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: 1.00 +/-0.00 Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: 0.94 +/-0.05 Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: 0.92 +/-0.07 Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: 0.92 +/-0.07 Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: 0.84 +/-0.10 Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy: 0.80 +/-0.06

With n_candidates=100, the output is

Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: 1.00 +/-0.00 Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: 0.94 +/-0.05 Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: 0.92 +/-0.07 Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: 0.90 +/-0.11 Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: 0.82 +/-0.13 Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy: 0.78 +/-0.04



---
Miroslav Batchkarov
PhD Student,
Text Analysis Group,
Department of Informatics,
University of Sussex



-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 5
Date: Wed, 15 Apr 2015 11:23:32 -0400
From: Andreas Mueller <[email protected]>
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn with
a re-implementation of a Random Forest based iterative feature
selection method
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset="windows-1252"

Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to
RFE(RandomForestClassifier()).
Is it qualitatively different from that? Does it use a different feature
importance?

btw: your mail is flagged as spam as your link is broken and links to
some imperial college internal page.

Cheers,
Andy

On 04/15/2015 05:03 AM, Daniel Homola wrote:
> Hi all,
>
> I needed a multivariate feature selection method for my work. As I'm
> working with biological/medical data, where n < p or even n << p I
> started to read up on Random Foretst based methods, as in my limited
> understanding RF copes pretty well with this suboptimal situation.
>
> I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/
> <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
>
> After reading the paper and checking some of the pretty impressive
> citations I thought I'd try it, but it was really slow. So I thought
> I'll reimplement it in Python, because I hoped (based on
> thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn > <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
> that it will be faster. And it is :) I mean a LOT faster..
>
> I was wondering if this would be something that you would consider
> incorporating into the feature selection module of scikit-learn?
>
> If yes, do you have a tutorial or some sort of guidance about how
> should I prepare the code, what conventions should I follow, etc?
>
> Cheers,
>
> Daniel Homola
>
> STRATiGRAD PhD Programme
> Imperial College London
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF

------------------------------

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


End of Scikit-learn-general Digest, Vol 63, Issue 28
****************************************************


------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to