Hi Andreas,
I have an implementation of the ALM method for Robust PCA from Candes
using Jake Vanderplas' PyPROPACK. It's in a private bitbucket repo
but I will move it to github and send the link if you like. I
actually really wanted to contribute RPCA to sklearn.
I don't know about a PR but I found a while back someone wishing to
add RPCA and maybe doing GSoC for it.
Also, there's a slight variant on Robust PCA that apparently is more
scalable. The paper is here:
http://www.icml-2011.org/papers/41_icmlpaper.pdf
I intend to explore some of the different methods for the low rank
plus sparse problem.
Alex (no longer a lurker, it seems)
------------------------------------------------------------------------
From: scikit-learn-general-requ...@lists.sourceforge.net
<mailto:scikit-learn-general-requ...@lists.sourceforge.net>
Sent: 4/15/2015 8:24 AM
To: scikit-learn-general@lists.sourceforge.net
<mailto:scikit-learn-general@lists.sourceforge.net>
Subject: Scikit-learn-general Digest, Vol 63, Issue 28
Send Scikit-learn-general mailing list submissions to
scikit-learn-general@lists.sourceforge.net
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
scikit-learn-general-requ...@lists.sourceforge.net
You can reach the person managing the list at
scikit-learn-general-ow...@lists.sourceforge.net
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."
Today's Topics:
1. Re: pydata (Andreas Mueller)
2. Robust PCA (Andreas Mueller)
3. Re: Robust PCA (Kyle Kastner)
4. Performance of LSHForest (Miroslav Batchkarov)
5. Re: Contributing to scikit-learn with a re-implementation of
a Random Forest based iterative feature selection method
(Andreas Mueller)
----------------------------------------------------------------------
Message: 1
Date: Wed, 15 Apr 2015 10:15:56 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: Re: [Scikit-learn-general] pydata
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e729c.1080...@gmail.com>
Content-Type: text/plain; charset="windows-1252"
PyData London is soon, not sure the date is official. It's end of June,
I think.
In NYC I think I'm talking at a Python meetup at April 23rd.
On 04/14/2015 06:05 PM, Pagliari, Roberto wrote:
> Is there a pydata or sklearn workshop coming up in NYC or London?
>
> Thank you,
>
>
>
------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 2
Date: Wed, 15 Apr 2015 10:33:59 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: [Scikit-learn-general] Robust PCA
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e76d7.3000...@gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Hey all.
Was there some plan to add Robust PCA at some point? I vaguely remember
a PR, but maybe I'm making things up.
It sounds like a pretty cool model and is widely used:
Sparse
http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
[and I was just promised a good implementation]
Andy
------------------------------
Message: 3
Date: Wed, 15 Apr 2015 11:04:21 -0400
From: Kyle Kastner <kastnerk...@gmail.com>
Subject: Re: [Scikit-learn-general] Robust PCA
To: scikit-learn-general@lists.sourceforge.net
Message-ID:
<CAGNZ19C-_70uNq49_T+Rmey6=0dsh1sbrqvej2eypcepp4d...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Robust PCA is awesome - I would definitely like to see a good and fast
version. I had a version once upon a time, but it was neither good
*or* fast :)
On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller <t3k...@gmail.com>
wrote:
> Hey all.
> Was there some plan to add Robust PCA at some point? I vaguely remember
> a PR, but maybe I'm making things up.
> It sounds like a pretty cool model and is widely used:
> Sparse
> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
>
> [and I was just promised a good implementation]
>
> Andy
>
>
------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------
Message: 4
Date: Wed, 15 Apr 2015 16:12:26 +0100
From: Miroslav Batchkarov <mbatchka...@gmail.com>
Subject: [Scikit-learn-general] Performance of LSHForest
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <640c3bb8-ae05-402e-9d44-f96fd2488...@gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi everyone,
was really impressed by the speedups provided by LSHForest compared to
brute-force search. Out of curiosity, I compared LSRForest to the
existing ball tree implementation. The approximate algorithm is
consistently slower (see below). Is this normal and should it be
mentioned in the documentation? Does approximate search offer any
benefits in terms of memory usage?
I ran the same example
<http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
with a algorithm=ball_tree. I also had to set metric=?euclidean? (this
may affect results). The output is:
Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy:
1.00 +/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy:
0.94 +/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy:
0.92 +/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3,
accuracy: 0.92 +/-0.07
Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5,
accuracy: 0.84 +/-0.10
Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5,
accuracy: 0.80 +/-0.06
With n_candidates=100, the output is
Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy:
1.00 +/-0.00
Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy:
0.94 +/-0.05
Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy:
0.92 +/-0.07
Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4,
accuracy: 0.90 +/-0.11
Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7,
accuracy: 0.82 +/-0.13
Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6,
accuracy: 0.78 +/-0.04
---
Miroslav Batchkarov
PhD Student,
Text Analysis Group,
Department of Informatics,
University of Sussex
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 5
Date: Wed, 15 Apr 2015 11:23:32 -0400
From: Andreas Mueller <t3k...@gmail.com>
Subject: Re: [Scikit-learn-general] Contributing to scikit-learn with
a re-implementation of a Random Forest based iterative feature
selection method
To: scikit-learn-general@lists.sourceforge.net
Message-ID: <552e8274.8080...@gmail.com>
Content-Type: text/plain; charset="windows-1252"
Hi Daniel.
That sounds potentially interesting.
Is there a widely cited paper for this?
I didn't read the paper, but it looks very similar to
RFE(RandomForestClassifier()).
Is it qualitatively different from that? Does it use a different feature
importance?
btw: your mail is flagged as spam as your link is broken and links to
some imperial college internal page.
Cheers,
Andy
On 04/15/2015 05:03 AM, Daniel Homola wrote:
> Hi all,
>
> I needed a multivariate feature selection method for my work. As I'm
> working with biological/medical data, where n < p or even n << p I
> started to read up on Random Foretst based methods, as in my limited
> understanding RF copes pretty well with this suboptimal situation.
>
> I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/
>
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
>
> After reading the paper and checking some of the pretty impressive
> citations I thought I'd try it, but it was really slow. So I thought
> I'll reimplement it in Python, because I hoped (based on
>
thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn
>
<https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
> that it will be faster. And it is :) I mean a LOT faster..
>
> I was wondering if this would be something that you would consider
> incorporating into the feature selection module of scikit-learn?
>
> If yes, do you have a tutorial or some sort of guidance about how
> should I prepare the code, what conventions should I follow, etc?
>
> Cheers,
>
> Daniel Homola
>
> STRATiGRAD PhD Programme
> Imperial College London
>
>
>
------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live
exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
End of Scikit-learn-general Digest, Vol 63, Issue 28
****************************************************
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general