Did you look at GoDec at all? At least when I checked it was more
scalable. My bad implementations translated from MATLAB are here:
http://kastnerkyle.github.io/blog/2014/03/05/matrix-decomposition/

As far as PROPACK goes - what are the minimal methods we would need to
port? I don't know that we would be able to add that as a dependency.
Maybe one key method or method(s) could be written in cython that
would let use get most of the functionality.

On Wed, Apr 15, 2015 at 12:57 PM, Andreas Mueller <t3k...@gmail.com> wrote:
> Hi Alex.
> Thanks for that :) It would be great if you could publish your version to
> github.
> We probably can't use PyPROPACK in scikit-learn.
> The GSoC application period is just over, so you'd have to wait till next
> year to do that.
>
> Cheers,
> Andy
>
>
> On 04/15/2015 12:53 PM, Alex wrote:
>
> Hi Andreas,
> I have an implementation of the ALM method for Robust PCA from Candes using
> Jake Vanderplas' PyPROPACK.  It's in a private bitbucket repo but I will
> move it to github and send the link if you like.  I actually really wanted
> to contribute RPCA to sklearn.
>
> I don't know about a PR but I found a while back someone wishing to add RPCA
> and maybe doing GSoC for it.
>
> Also, there's a slight variant on Robust PCA that apparently is more
> scalable.  The paper is here:
> http://www.icml-2011.org/papers/41_icmlpaper.pdf
>
> I intend to explore some of the different methods for the low rank plus
> sparse problem.
>
> Alex (no longer a lurker, it seems)
> ________________________________
> From: scikit-learn-general-requ...@lists.sourceforge.net
> Sent: ‎4/‎15/‎2015 8:24 AM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Scikit-learn-general Digest, Vol 63, Issue 28
>
> Send Scikit-learn-general mailing list submissions to
> scikit-learn-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
> scikit-learn-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
> scikit-learn-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>    1. Re: pydata (Andreas Mueller)
>    2. Robust PCA (Andreas Mueller)
>    3. Re: Robust PCA (Kyle Kastner)
>    4. Performance of LSHForest (Miroslav Batchkarov)
>    5. Re: Contributing to scikit-learn with a re-implementation of
>       a Random Forest based iterative feature selection method
>       (Andreas Mueller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Apr 2015 10:15:56 -0400
> From: Andreas Mueller <t3k...@gmail.com>
> Subject: Re: [Scikit-learn-general] pydata
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <552e729c.1080...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> PyData London is soon, not sure the date is official. It's end of June,
> I think.
>
> In NYC I think I'm talking at a Python meetup at April 23rd.
>
>
> On 04/14/2015 06:05 PM, Pagliari, Roberto wrote:
>> Is there a pydata or sklearn workshop coming up in NYC or London?
>>
>> Thank you,
>>
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Wed, 15 Apr 2015 10:33:59 -0400
> From: Andreas Mueller <t3k...@gmail.com>
> Subject: [Scikit-learn-general] Robust PCA
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <552e76d7.3000...@gmail.com>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hey all.
> Was there some plan to add Robust PCA at some point? I vaguely remember
> a PR, but maybe I'm making things up.
> It sounds like a pretty cool model and is widely used:
> Sparse
> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
>
> [and I was just promised a good implementation]
>
> Andy
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 15 Apr 2015 11:04:21 -0400
> From: Kyle Kastner <kastnerk...@gmail.com>
> Subject: Re: [Scikit-learn-general] Robust PCA
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
> <CAGNZ19C-_70uNq49_T+Rmey6=0dsh1sbrqvej2eypcepp4d...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Robust PCA is awesome - I would definitely like to see a good and fast
> version. I had a version once upon a time, but it was neither good
> *or* fast :)
>
> On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller <t3k...@gmail.com> wrote:
>> Hey all.
>> Was there some plan to add Robust PCA at some point? I vaguely remember
>> a PR, but maybe I'm making things up.
>> It sounds like a pretty cool model and is widely used:
>> Sparse
>> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf
>>
>> [and I was just promised a good implementation]
>>
>> Andy
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 15 Apr 2015 16:12:26 +0100
> From: Miroslav Batchkarov <mbatchka...@gmail.com>
> Subject: [Scikit-learn-general] Performance of LSHForest
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <640c3bb8-ae05-402e-9d44-f96fd2488...@gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi everyone,
>
> was really impressed by the speedups provided by LSHForest compared to
> brute-force search. Out of curiosity, I compared LSRForest to the existing
> ball tree implementation. The approximate algorithm is consistently slower
> (see below). Is this normal and should it be mentioned in the documentation?
> Does approximate search offer any benefits in terms of memory usage?
>
>
> I ran the same example
> <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
> with a algorithm=ball_tree. I also had to set metric=?euclidean? (this may
> affect results). The output is:
>
> Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: 1.00
> +/-0.00
> Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: 0.94
> +/-0.05
> Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: 0.92
> +/-0.07
> Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: 0.92
> +/-0.07
> Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: 0.84
> +/-0.10
> Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy:
> 0.80 +/-0.06
>
> With n_candidates=100, the output is
>
> Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: 1.00
> +/-0.00
> Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: 0.94
> +/-0.05
> Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: 0.92
> +/-0.07
> Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: 0.90
> +/-0.11
> Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: 0.82
> +/-0.13
> Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy:
> 0.78 +/-0.04
>
>
>
> ---
> Miroslav Batchkarov
> PhD Student,
> Text Analysis Group,
> Department of Informatics,
> University of Sussex
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 5
> Date: Wed, 15 Apr 2015 11:23:32 -0400
> From: Andreas Mueller <t3k...@gmail.com>
> Subject: Re: [Scikit-learn-general] Contributing to scikit-learn with
> a re-implementation of a Random Forest based iterative feature
> selection method
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <552e8274.8080...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Daniel.
> That sounds potentially interesting.
> Is there a widely cited paper for this?
> I didn't read the paper, but it looks very similar to
> RFE(RandomForestClassifier()).
> Is it qualitatively different from that? Does it use a different feature
> importance?
>
> btw: your mail is flagged as spam as your link is broken and links to
> some imperial college internal page.
>
> Cheers,
> Andy
>
> On 04/15/2015 05:03 AM, Daniel Homola wrote:
>> Hi all,
>>
>> I needed a multivariate feature selection method for my work. As I'm
>> working with biological/medical data, where n < p or even n << p I
>> started to read up on Random Foretst based methods, as in my limited
>> understanding RF copes pretty well with this suboptimal situation.
>>
>> I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/
>>
>> <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f>
>>
>> After reading the paper and checking some of the pretty impressive
>> citations I thought I'd try it, but it was really slow. So I thought
>> I'll reimplement it in Python, because I hoped (based on
>>
>> thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn
>>
>> <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>)
>> that it will be faster. And it is :) I mean a LOT faster..
>>
>> I was wondering if this would be something that you would consider
>> incorporating into the feature selection module of scikit-learn?
>>
>> If yes, do you have a tutorial or some sort of guidance about how
>> should I prepare the code, what conventions should I follow, etc?
>>
>> Cheers,
>>
>> Daniel Homola
>>
>> STRATiGRAD PhD Programme
>> Imperial College London
>>
>>
>>
>> ------------------------------------------------------------------------------
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live
>> exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
>> event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 63, Issue 28
> ****************************************************
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to