Did you look at GoDec at all? At least when I checked it was more scalable. My bad implementations translated from MATLAB are here: http://kastnerkyle.github.io/blog/2014/03/05/matrix-decomposition/
As far as PROPACK goes - what are the minimal methods we would need to port? I don't know that we would be able to add that as a dependency. Maybe one key method or method(s) could be written in cython that would let use get most of the functionality. On Wed, Apr 15, 2015 at 12:57 PM, Andreas Mueller <t3k...@gmail.com> wrote: > Hi Alex. > Thanks for that :) It would be great if you could publish your version to > github. > We probably can't use PyPROPACK in scikit-learn. > The GSoC application period is just over, so you'd have to wait till next > year to do that. > > Cheers, > Andy > > > On 04/15/2015 12:53 PM, Alex wrote: > > Hi Andreas, > I have an implementation of the ALM method for Robust PCA from Candes using > Jake Vanderplas' PyPROPACK. It's in a private bitbucket repo but I will > move it to github and send the link if you like. I actually really wanted > to contribute RPCA to sklearn. > > I don't know about a PR but I found a while back someone wishing to add RPCA > and maybe doing GSoC for it. > > Also, there's a slight variant on Robust PCA that apparently is more > scalable. The paper is here: > http://www.icml-2011.org/papers/41_icmlpaper.pdf > > I intend to explore some of the different methods for the low rank plus > sparse problem. > > Alex (no longer a lurker, it seems) > ________________________________ > From: scikit-learn-general-requ...@lists.sourceforge.net > Sent: 4/15/2015 8:24 AM > To: scikit-learn-general@lists.sourceforge.net > Subject: Scikit-learn-general Digest, Vol 63, Issue 28 > > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > or, via email, send a message with subject or body 'help' to > scikit-learn-general-requ...@lists.sourceforge.net > > You can reach the person managing the list at > scikit-learn-general-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Scikit-learn-general digest..." > > > Today's Topics: > > 1. Re: pydata (Andreas Mueller) > 2. Robust PCA (Andreas Mueller) > 3. Re: Robust PCA (Kyle Kastner) > 4. Performance of LSHForest (Miroslav Batchkarov) > 5. Re: Contributing to scikit-learn with a re-implementation of > a Random Forest based iterative feature selection method > (Andreas Mueller) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 15 Apr 2015 10:15:56 -0400 > From: Andreas Mueller <t3k...@gmail.com> > Subject: Re: [Scikit-learn-general] pydata > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <552e729c.1080...@gmail.com> > Content-Type: text/plain; charset="windows-1252" > > PyData London is soon, not sure the date is official. It's end of June, > I think. > > In NYC I think I'm talking at a Python meetup at April 23rd. > > > On 04/14/2015 06:05 PM, Pagliari, Roberto wrote: >> Is there a pydata or sklearn workshop coming up in NYC or London? >> >> Thank you, >> >> >> >> ------------------------------------------------------------------------------ >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live >> exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Wed, 15 Apr 2015 10:33:59 -0400 > From: Andreas Mueller <t3k...@gmail.com> > Subject: [Scikit-learn-general] Robust PCA > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <552e76d7.3000...@gmail.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > Hey all. > Was there some plan to add Robust PCA at some point? I vaguely remember > a PR, but maybe I'm making things up. > It sounds like a pretty cool model and is widely used: > Sparse > http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf > > [and I was just promised a good implementation] > > Andy > > > > ------------------------------ > > Message: 3 > Date: Wed, 15 Apr 2015 11:04:21 -0400 > From: Kyle Kastner <kastnerk...@gmail.com> > Subject: Re: [Scikit-learn-general] Robust PCA > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > <CAGNZ19C-_70uNq49_T+Rmey6=0dsh1sbrqvej2eypcepp4d...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Robust PCA is awesome - I would definitely like to see a good and fast > version. I had a version once upon a time, but it was neither good > *or* fast :) > > On Wed, Apr 15, 2015 at 10:33 AM, Andreas Mueller <t3k...@gmail.com> wrote: >> Hey all. >> Was there some plan to add Robust PCA at some point? I vaguely remember >> a PR, but maybe I'm making things up. >> It sounds like a pretty cool model and is widely used: >> Sparse >> http://statweb.stanford.edu/~candes/papers/RobustPCA.pdf >> >> [and I was just promised a good implementation] >> >> Andy >> >> >> ------------------------------------------------------------------------------ >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live >> exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------ > > Message: 4 > Date: Wed, 15 Apr 2015 16:12:26 +0100 > From: Miroslav Batchkarov <mbatchka...@gmail.com> > Subject: [Scikit-learn-general] Performance of LSHForest > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <640c3bb8-ae05-402e-9d44-f96fd2488...@gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi everyone, > > was really impressed by the speedups provided by LSHForest compared to > brute-force search. Out of curiosity, I compared LSRForest to the existing > ball tree implementation. The approximate algorithm is consistently slower > (see below). Is this normal and should it be mentioned in the documentation? > Does approximate search offer any benefits in terms of memory usage? > > > I ran the same example > <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py> > with a algorithm=ball_tree. I also had to set metric=?euclidean? (this may > affect results). The output is: > > Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: 1.00 > +/-0.00 > Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: 0.94 > +/-0.05 > Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: 0.92 > +/-0.07 > Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: 0.92 > +/-0.07 > Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: 0.84 > +/-0.10 > Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy: > 0.80 +/-0.06 > > With n_candidates=100, the output is > > Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: 1.00 > +/-0.00 > Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: 0.94 > +/-0.05 > Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: 0.92 > +/-0.07 > Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: 0.90 > +/-0.11 > Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: 0.82 > +/-0.13 > Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy: > 0.78 +/-0.04 > > > > --- > Miroslav Batchkarov > PhD Student, > Text Analysis Group, > Department of Informatics, > University of Sussex > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 5 > Date: Wed, 15 Apr 2015 11:23:32 -0400 > From: Andreas Mueller <t3k...@gmail.com> > Subject: Re: [Scikit-learn-general] Contributing to scikit-learn with > a re-implementation of a Random Forest based iterative feature > selection method > To: scikit-learn-general@lists.sourceforge.net > Message-ID: <552e8274.8080...@gmail.com> > Content-Type: text/plain; charset="windows-1252" > > Hi Daniel. > That sounds potentially interesting. > Is there a widely cited paper for this? > I didn't read the paper, but it looks very similar to > RFE(RandomForestClassifier()). > Is it qualitatively different from that? Does it use a different feature > importance? > > btw: your mail is flagged as spam as your link is broken and links to > some imperial college internal page. > > Cheers, > Andy > > On 04/15/2015 05:03 AM, Daniel Homola wrote: >> Hi all, >> >> I needed a multivariate feature selection method for my work. As I'm >> working with biological/medical data, where n < p or even n << p I >> started to read up on Random Foretst based methods, as in my limited >> understanding RF copes pretty well with this suboptimal situation. >> >> I came across an R package called Boruta:https://m2.icm.edu.pl/boruta/ >> >> <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=https%3a%2f%2fm2.icm.edu.pl%2fboruta%2f> >> >> After reading the paper and checking some of the pretty impressive >> citations I thought I'd try it, but it was really slow. So I thought >> I'll reimplement it in Python, because I hoped (based on >> >> thishttp://www.slideshare.net/glouppe/accelerating-random-forests-in-scikitlearn >> >> <https://exchange.imperial.ac.uk/owa/redir.aspx?C=Yp1dHGp6hkyiZQZzx17DHznOv7PxStIIK3PgwAs_McazihitoU3Fm6_EBXvwfIJB2CJSzkCKKjo.&URL=http%3a%2f%2fwww.slideshare.net%2fglouppe%2faccelerating-random-forests-in-scikitlearn>) >> that it will be faster. And it is :) I mean a LOT faster.. >> >> I was wondering if this would be something that you would consider >> incorporating into the feature selection module of scikit-learn? >> >> If yes, do you have a tutorial or some sort of guidance about how >> should I prepare the code, what conventions should I follow, etc? >> >> Cheers, >> >> Daniel Homola >> >> STRATiGRAD PhD Programme >> Imperial College London >> >> >> >> ------------------------------------------------------------------------------ >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live >> exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > ------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > End of Scikit-learn-general Digest, Vol 63, Issue 28 > **************************************************** > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > > > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general