GoDec might not have the citations (yet) to be added to scikit-learn. But I think a basic ALM based RPCA would be a great addition, along with a cool demo. Background smart background subtraction would be my vote but might be too heavy weight - I could see a cool example of something like colored bouncing balls overlaid on the china picture that is built in for sklearn.
On Thu, Apr 16, 2015 at 1:18 PM, Alex Papanicolaou <alex.papa...@gmail.com> wrote: > How about something like this: > 1. Basic implementation of ALM uses arpack (not ideal but it means sklearn > can have RPCA available) > 2. Option to use randomized SVD if desired > 3. Option to use propack if desired and it's available (or if/when scipy > begins to use it) > 4. GoDec implementation for low rank + sparse + noise > > > > > On Wed, Apr 15, 2015 at 4:06 PM, > <scikit-learn-general-requ...@lists.sourceforge.net> wrote: >> >> Send Scikit-learn-general mailing list submissions to >> scikit-learn-general@lists.sourceforge.net >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> or, via email, send a message with subject or body 'help' to >> scikit-learn-general-requ...@lists.sourceforge.net >> >> You can reach the person managing the list at >> scikit-learn-general-ow...@lists.sourceforge.net >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Scikit-learn-general digest..." >> >> >> Today's Topics: >> >> 1. Re: Scikit-learn-general Digest, Vol 63, Issue 34 >> (Alex Papanicolaou) >> 2. Re: Robust PCA (Olivier Grisel) >> 3. Re: Robust PCA (Kyle Kastner) >> 4. Re: Robust PCA (Yogesh Karpate) >> 5. Re: Performance of LSHForest (Joel Nothman) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Wed, 15 Apr 2015 11:22:17 -0700 >> From: Alex Papanicolaou <alex.papa...@gmail.com> >> Subject: Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol >> 63, Issue 34 >> To: scikit-learn-general@lists.sourceforge.net >> Message-ID: >> >> <CAGNPn4qTmTXOgpLX=ziqapuv5b29iecvfrfwpo96rnectww...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> Kyle & Andreas, >> >> Here is my github repo: >> https://github.com/apapanico/RPCA >> >> Responses: >> 1. I didn't make the GSoC suggestion a few years (also not a student >> anymore :-(, just using RPCA for work), I just came across it in a google >> search when trying to find python implementations. >> 2. As for GoDec, I have not poked around with it but I would like to. I >> had intended to use this as a starting point: >> https://sites.google.com/site/godecomposition/home >> But yea, it sounds like it can go much bigger. But if I'm not mistaken, >> it's technically a different problem (low rank + sparse + noise). >> 3. Regarding PROPACK, the main routine needed is lansvd which implements >> Lanczos bidiagonalization with partial reorthogonalization. I do not know >> what else that depends on. I also do not know if there's an >> implementation >> in C which would be preferred, obviously. A routine for computing only >> top-k singular triplets is pretty key for making Candes' ALM method as >> efficient as possible. Along these lines, I started out using the >> randomized SVD from sklearn but I was failing my tests generated with the >> original Matlab code so I switched to numpy svd and then finally svdp in >> pypropack. >> >> Cheers, >> Alex >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 2 >> Date: Wed, 15 Apr 2015 15:40:33 -0400 >> From: Olivier Grisel <olivier.gri...@ensta.org> >> Subject: Re: [Scikit-learn-general] Robust PCA >> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> >> Message-ID: >> >> <CAFvE7K60pn7-YP7rfreFU932on8omr7=q8-Vxsf0a+=v_nt...@mail.gmail.com> >> Content-Type: text/plain; charset=UTF-8 >> >> We could use PyPROPACK if it was contributed upstream in scipy ;) >> >> I know that some scipy maintainers don't appreciate arpack much and >> would like to see it replaced (or at least completed with propack). >> >> -- >> Olivier >> >> >> >> ------------------------------ >> >> Message: 3 >> Date: Wed, 15 Apr 2015 15:51:01 -0400 >> From: Kyle Kastner <kastnerk...@gmail.com> >> Subject: Re: [Scikit-learn-general] Robust PCA >> To: scikit-learn-general@lists.sourceforge.net >> Message-ID: >> >> <CAGNZ19AqUxUV3So_pQ2vn=hDQzMkD4Wgodm6uwTUWAZbomx=_...@mail.gmail.com> >> Content-Type: text/plain; charset=UTF-8 >> >> IF it was in scipy would it be backported to the older versions? How >> would we handle that? >> >> On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel >> <olivier.gri...@ensta.org> wrote: >> > We could use PyPROPACK if it was contributed upstream in scipy ;) >> > >> > I know that some scipy maintainers don't appreciate arpack much and >> > would like to see it replaced (or at least completed with propack). >> > >> > -- >> > Olivier >> > >> > >> > ------------------------------------------------------------------------------ >> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> > Develop your own process in accordance with the BPMN 2 standard >> > Learn Process modeling best practices with Bonita BPM through live >> > exercises >> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> > event?utm_ >> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > Scikit-learn-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> ------------------------------ >> >> Message: 4 >> Date: Wed, 15 Apr 2015 22:02:40 +0200 >> From: Yogesh Karpate <yogeshkarp...@gmail.com> >> Subject: Re: [Scikit-learn-general] Robust PCA >> To: scikit-learn-general@lists.sourceforge.net >> Message-ID: >> >> <CAG7mFDvXJF9gKF3LBuAk=unzibj5sxpyksiz+iueusdrkg0...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> >> Couple of months back, I tried to use following >> https://github.com/shriphani/robust_pcp/blob/master/robust_pcp.py >> But I could not install pypropack develope by Jake Vanderplas >> So I used randomized_svd from Scikitlearn instead of svdp in the code >> mentioned above. >> It worked "OK" for me. >> >> >> On Wed, Apr 15, 2015 at 9:51 PM, Kyle Kastner <kastnerk...@gmail.com> >> wrote: >> >> > IF it was in scipy would it be backported to the older versions? How >> > would we handle that? >> > >> > On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel >> > <olivier.gri...@ensta.org> wrote: >> > > We could use PyPROPACK if it was contributed upstream in scipy ;) >> > > >> > > I know that some scipy maintainers don't appreciate arpack much and >> > > would like to see it replaced (or at least completed with propack). >> > > >> > > -- >> > > Olivier >> > > >> > > >> > >> > ------------------------------------------------------------------------------ >> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> > > Develop your own process in accordance with the BPMN 2 standard >> > > Learn Process modeling best practices with Bonita BPM through live >> > exercises >> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> > event?utm_ >> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> > > _______________________________________________ >> > > Scikit-learn-general mailing list >> > > Scikit-learn-general@lists.sourceforge.net >> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> > Develop your own process in accordance with the BPMN 2 standard >> > Learn Process modeling best practices with Bonita BPM through live >> > exercises >> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> > event?utm_ >> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > Scikit-learn-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> >> >> >> -- >> Warm Regards >> Yogesh Karpate >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> Message: 5 >> Date: Thu, 16 Apr 2015 09:06:51 +1000 >> From: Joel Nothman <joel.noth...@gmail.com> >> Subject: Re: [Scikit-learn-general] Performance of LSHForest >> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> >> Message-ID: >> >> <caakaflvyw6ol2ebm0dsh6f3o-mdb80kbnmeurnt+5seftz7...@mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> I agree this is disappointing, and we need to work on making LSHForest >> faster. Portions should probably be coded in Cython, for instance, as the >> current implementation is a bit circuitous in order to work in numpy. PRs >> are welcome. >> >> LSHForest could use parallelism to be faster, but so can (and will) the >> exact neighbors methods. In theory in LSHForest, each "tree" could be >> stored on entirely different machines, providing memory benefits, but >> scikit-learn can't really take advantage of this. >> >> Having said that, I would also try with higher n_features and n_queries. >> We >> have to limit the scale of our examples in order to limit the overall >> document compilation time. >> >> On 16 April 2015 at 01:12, Miroslav Batchkarov <mbatchka...@gmail.com> >> >> wrote: >> >> > Hi everyone, >> > >> > was really impressed by the speedups provided by LSHForest compared to >> > brute-force search. Out of curiosity, I compared LSRForest to the >> > existing >> > ball tree implementation. The approximate algorithm is consistently >> > slower >> > (see below). Is this normal and should it be mentioned in the >> > documentation? Does approximate search offer any benefits in terms of >> > memory usage? >> > >> > >> > I ran the same example >> > >> > <http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py> >> > with >> > a algorithm=ball_tree. I also had to set metric=?euclidean? (this may >> > affect results). The output is: >> > >> > Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy: >> > 1.00 +/-0.00 >> > Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy: >> > 0.94 +/-0.05 >> > Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy: >> > 0.92 +/-0.07 >> > Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy: >> > 0.92 +/-0.07 >> > Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy: >> > 0.84 +/-0.10 >> > Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy: >> > 0.80 +/-0.06 >> > >> > With n_candidates=100, the output is >> > >> > Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy: >> > 1.00 +/-0.00 >> > Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy: >> > 0.94 +/-0.05 >> > Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy: >> > 0.92 +/-0.07 >> > Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy: >> > 0.90 +/-0.11 >> > Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy: >> > 0.82 +/-0.13 >> > Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy: >> > 0.78 +/-0.04 >> > >> > >> > >> > --- >> > Miroslav Batchkarov >> > PhD Student, >> > Text Analysis Group, >> > Department of Informatics, >> > University of Sussex >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> > Develop your own process in accordance with the BPMN 2 standard >> > Learn Process modeling best practices with Bonita BPM through live >> > exercises >> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> > event?utm_ >> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> > _______________________________________________ >> > Scikit-learn-general mailing list >> > Scikit-learn-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> >> ------------------------------ >> >> >> ------------------------------------------------------------------------------ >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live >> exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- >> event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> >> ------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> End of Scikit-learn-general Digest, Vol 63, Issue 35 >> **************************************************** > > > > ------------------------------------------------------------------------------ > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general