How about something like this:
1.  Basic implementation of ALM uses arpack (not ideal but it means sklearn
can have RPCA available)
2.  Option to use randomized SVD if desired
3.  Option to use propack if desired and it's available (or if/when scipy
begins to use it)
4.  GoDec implementation for low rank + sparse + noise




On Wed, Apr 15, 2015 at 4:06 PM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:

> Send Scikit-learn-general mailing list submissions to
>         scikit-learn-general@lists.sourceforge.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> or, via email, send a message with subject or body 'help' to
>         scikit-learn-general-requ...@lists.sourceforge.net
>
> You can reach the person managing the list at
>         scikit-learn-general-ow...@lists.sourceforge.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scikit-learn-general digest..."
>
>
> Today's Topics:
>
>    1. Re: Scikit-learn-general Digest, Vol 63,  Issue 34
>       (Alex Papanicolaou)
>    2. Re: Robust PCA (Olivier Grisel)
>    3. Re: Robust PCA (Kyle Kastner)
>    4. Re: Robust PCA (Yogesh Karpate)
>    5. Re: Performance of LSHForest (Joel Nothman)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 15 Apr 2015 11:22:17 -0700
> From: Alex Papanicolaou <alex.papa...@gmail.com>
> Subject: Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol
>         63,     Issue 34
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CAGNPn4qTmTXOgpLX=
> ziqapuv5b29iecvfrfwpo96rnectww...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Kyle & Andreas,
>
> Here is my github repo:
> https://github.com/apapanico/RPCA
>
> Responses:
> 1. I didn't make the GSoC suggestion a few years (also not a student
> anymore :-(, just using RPCA for work), I just came across it in a google
> search when trying to find python implementations.
> 2. As for GoDec, I have not poked around with it but I would like to.  I
> had intended to use this as a starting point:
> https://sites.google.com/site/godecomposition/home
> But yea, it sounds like it can go much bigger.   But if I'm not mistaken,
> it's technically a different problem (low rank + sparse + noise).
> 3. Regarding PROPACK, the main routine needed is lansvd which implements
> Lanczos bidiagonalization with partial reorthogonalization.  I do not know
> what else that depends on.  I also do not know if there's an implementation
> in C which would be preferred, obviously.  A routine for computing only
> top-k singular triplets is pretty key for making Candes' ALM method as
> efficient as possible.  Along these lines, I started out using the
> randomized SVD from sklearn but I was failing my tests generated with the
> original Matlab code so I switched to numpy svd and then finally svdp in
> pypropack.
>
> Cheers,
> Alex
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 2
> Date: Wed, 15 Apr 2015 15:40:33 -0400
> From: Olivier Grisel <olivier.gri...@ensta.org>
> Subject: Re: [Scikit-learn-general] Robust PCA
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <CAFvE7K60pn7-YP7rfreFU932on8omr7=q8-Vxsf0a+=
> v_nt...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> We could use PyPROPACK if it was contributed upstream in scipy ;)
>
> I know that some scipy maintainers don't appreciate arpack much and
> would like to see it replaced (or at least completed with propack).
>
> --
> Olivier
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 15 Apr 2015 15:51:01 -0400
> From: Kyle Kastner <kastnerk...@gmail.com>
> Subject: Re: [Scikit-learn-general] Robust PCA
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CAGNZ19AqUxUV3So_pQ2vn=hDQzMkD4Wgodm6uwTUWAZbomx=_
> g...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> IF it was in scipy would it be backported to the older versions? How
> would we handle that?
>
> On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel
> <olivier.gri...@ensta.org> wrote:
> > We could use PyPROPACK if it was contributed upstream in scipy ;)
> >
> > I know that some scipy maintainers don't appreciate arpack much and
> > would like to see it replaced (or at least completed with propack).
> >
> > --
> > Olivier
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 15 Apr 2015 22:02:40 +0200
> From: Yogesh Karpate <yogeshkarp...@gmail.com>
> Subject: Re: [Scikit-learn-general] Robust PCA
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID:
>         <CAG7mFDvXJF9gKF3LBuAk=
> unzibj5sxpyksiz+iueusdrkg0...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Couple of months back, I tried to use following
> https://github.com/shriphani/robust_pcp/blob/master/robust_pcp.py
> But I could not install pypropack develope by Jake Vanderplas
> So I used randomized_svd from Scikitlearn instead of svdp in the code
> mentioned above.
> It worked "OK" for me.
>
>
> On Wed, Apr 15, 2015 at 9:51 PM, Kyle Kastner <kastnerk...@gmail.com>
> wrote:
>
> > IF it was in scipy would it be backported to the older versions? How
> > would we handle that?
> >
> > On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel
> > <olivier.gri...@ensta.org> wrote:
> > > We could use PyPROPACK if it was contributed upstream in scipy ;)
> > >
> > > I know that some scipy maintainers don't appreciate arpack much and
> > > would like to see it replaced (or at least completed with propack).
> > >
> > > --
> > > Olivier
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > > Develop your own process in accordance with the BPMN 2 standard
> > > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > event?utm_
> > > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > > _______________________________________________
> > > Scikit-learn-general mailing list
> > > Scikit-learn-general@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
>
>
>
> --
>     Warm Regards
>     Yogesh Karpate
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 5
> Date: Thu, 16 Apr 2015 09:06:51 +1000
> From: Joel Nothman <joel.noth...@gmail.com>
> Subject: Re: [Scikit-learn-general] Performance of LSHForest
> To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
> Message-ID:
>         <
> caakaflvyw6ol2ebm0dsh6f3o-mdb80kbnmeurnt+5seftz7...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> I agree this is disappointing, and we need to work on making LSHForest
> faster. Portions should probably be coded in Cython, for instance, as the
> current implementation is a bit circuitous in order to work in numpy. PRs
> are welcome.
>
> LSHForest could use parallelism to be faster, but so can (and will) the
> exact neighbors methods. In theory in LSHForest, each "tree" could be
> stored on entirely different machines, providing memory benefits, but
> scikit-learn can't really take advantage of this.
>
> Having said that, I would also try with higher n_features and n_queries. We
> have to limit the scale of our examples in order to limit the overall
> document compilation time.
>
> On 16 April 2015 at 01:12, Miroslav Batchkarov <mbatchka...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > was really impressed by the speedups provided by LSHForest compared to
> > brute-force search. Out of curiosity, I compared LSRForest to the
> existing
> > ball tree implementation. The approximate algorithm is consistently
> slower
> > (see below). Is this normal and should it be mentioned in the
> > documentation? Does approximate search offer any benefits in terms of
> > memory usage?
> >
> >
> > I ran the same example
> > <
> http://scikit-learn.org/stable/auto_examples/neighbors/plot_approximate_nearest_neighbors_scalability.html#example-neighbors-plot-approximate-nearest-neighbors-scalability-py>
> with
> > a algorithm=ball_tree. I also had to set metric=?euclidean? (this may
> > affect results). The output is:
> >
> > Index size: 1000, exact: 0.000s, LSHF: 0.007s, speedup: 0.0, accuracy:
> > 1.00 +/-0.00
> > Index size: 2511, exact: 0.001s, LSHF: 0.007s, speedup: 0.1, accuracy:
> > 0.94 +/-0.05
> > Index size: 6309, exact: 0.001s, LSHF: 0.008s, speedup: 0.2, accuracy:
> > 0.92 +/-0.07
> > Index size: 15848, exact: 0.002s, LSHF: 0.008s, speedup: 0.3, accuracy:
> > 0.92 +/-0.07
> > Index size: 39810, exact: 0.005s, LSHF: 0.010s, speedup: 0.5, accuracy:
> > 0.84 +/-0.10
> > Index size: 100000, exact: 0.008s, LSHF: 0.016s, speedup: 0.5, accuracy:
> > 0.80 +/-0.06
> >
> > With n_candidates=100, the output is
> >
> > Index size: 1000, exact: 0.000s, LSHF: 0.006s, speedup: 0.0, accuracy:
> > 1.00 +/-0.00
> > Index size: 2511, exact: 0.001s, LSHF: 0.006s, speedup: 0.1, accuracy:
> > 0.94 +/-0.05
> > Index size: 6309, exact: 0.001s, LSHF: 0.005s, speedup: 0.2, accuracy:
> > 0.92 +/-0.07
> > Index size: 15848, exact: 0.002s, LSHF: 0.007s, speedup: 0.4, accuracy:
> > 0.90 +/-0.11
> > Index size: 39810, exact: 0.005s, LSHF: 0.008s, speedup: 0.7, accuracy:
> > 0.82 +/-0.13
> > Index size: 100000, exact: 0.007s, LSHF: 0.013s, speedup: 0.6, accuracy:
> > 0.78 +/-0.04
> >
> >
> >
> > ---
> > Miroslav Batchkarov
> > PhD Student,
> > Text Analysis Group,
> > Department of Informatics,
> > University of Sussex
> >
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> > Develop your own process in accordance with the BPMN 2 standard
> > Learn Process modeling best practices with Bonita BPM through live
> > exercises
> > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> > event?utm_
> > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live
> exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
> event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 63, Issue 35
> ****************************************************
>
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to