Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Joel Nothman
Although I note that I've got LaTeX compilation errors, so I'm not sure how
Andy compiles this.

On 16 April 2015 at 20:25, Joel Nothman joel.noth...@gmail.com wrote:

 I've proposed a better chapter ordering at
 https://github.com/scikit-learn/scikit-learn/pull/4602...

 On 16 April 2015 at 03:48, Andreas Mueller t3k...@gmail.com wrote:

 Hi.
 Yes, run make latexpdf in the doc folder.

 Best,
 Andy


 On 04/15/2015 01:11 PM, Tim wrote:
  Thanks, Andy!
 
  How do you generate the pdf file? Can I also do that?
 
  
  On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com wrote:
 
Subject: Re: [Scikit-learn-general] Is there a pdf documentation for
 the latest stable scikit-learn?
To: scikit-learn-general@lists.sourceforge.net
Date: Wednesday, April 15, 2015, 12:55 PM
 
Hi Tim.
There are pdfs for 0.16.0 and 0.16.1 up now
at
 
http://sourceforge.net/projects/scikit-learn/files/documentation/
 
Let us know if there are
issues with it.
 
Cheers,
Andy
 
 
On
04/15/2015 12:08 PM, Tim wrote:

Hello,

 I am
looking for a pdf file for the documentation for the latest
stable scikit-learn i.e. 0.16.1.

 I followed
 http://scikit-learn.org/stable/support.html#documentation-resources,
which leads me to
 http://sourceforge.net/projects/scikit-learn/files/documentation/,
But the pdf files are for = 0.12 version and no
later than 2012.


Can the official team make the pdf files available?

 Thanks!


 
  
 --
 BPM Camp - Free Virtual Workshop May 6th
at 10am PDT/1PM EDT
 Develop your own
process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with
Bonita BPM through live exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_

source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF

___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 
 
  
 --
BPM Camp - Free Virtual Workshop May 6th at
10am PDT/1PM EDT
Develop your own process in
accordance with the BPMN 2 standard
Learn
Process modeling best practices with Bonita BPM through live
exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 
 
 --
  BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
  Develop your own process in accordance with the BPMN 2 standard
  Learn Process modeling best practices with Bonita BPM through live
 exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Joel Nothman
I've proposed a better chapter ordering at
https://github.com/scikit-learn/scikit-learn/pull/4602...

On 16 April 2015 at 03:48, Andreas Mueller t3k...@gmail.com wrote:

 Hi.
 Yes, run make latexpdf in the doc folder.

 Best,
 Andy


 On 04/15/2015 01:11 PM, Tim wrote:
  Thanks, Andy!
 
  How do you generate the pdf file? Can I also do that?
 
  
  On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com wrote:
 
Subject: Re: [Scikit-learn-general] Is there a pdf documentation for
 the latest stable scikit-learn?
To: scikit-learn-general@lists.sourceforge.net
Date: Wednesday, April 15, 2015, 12:55 PM
 
Hi Tim.
There are pdfs for 0.16.0 and 0.16.1 up now
at
 
http://sourceforge.net/projects/scikit-learn/files/documentation/
 
Let us know if there are
issues with it.
 
Cheers,
Andy
 
 
On
04/15/2015 12:08 PM, Tim wrote:

Hello,

 I am
looking for a pdf file for the documentation for the latest
stable scikit-learn i.e. 0.16.1.

 I followed
 http://scikit-learn.org/stable/support.html#documentation-resources,
which leads me to
 http://sourceforge.net/projects/scikit-learn/files/documentation/,
But the pdf files are for = 0.12 version and no
later than 2012.


Can the official team make the pdf files available?

 Thanks!


 
  
 --
 BPM Camp - Free Virtual Workshop May 6th
at 10am PDT/1PM EDT
 Develop your own
process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with
Bonita BPM through live exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_

source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF

___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 
 
  
 --
BPM Camp - Free Virtual Workshop May 6th at
10am PDT/1PM EDT
Develop your own process in
accordance with the BPMN 2 standard
Learn
Process modeling best practices with Bonita BPM through live
exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 
 
 --
  BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
  Develop your own process in accordance with the BPMN 2 standard
  Learn Process modeling best practices with Bonita BPM through live
 exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Tim
Thanks again!

Can your scripts also create pdf bookmarks of third or lower levels? 
E.g. 
...
4.1.1 Ordinary Least Squares
4.1.2 Ridge Regression
Ridge Complexity
Setting the regularization parameter: generalized Cross-Validation
4.1.3 Lasso
Setting regularization parameter
Using cross-validation
Information-criteria based model selection
4.1.4 Elastic Net
...

Can we also show the numerical numberings in pdf bookmarks? 
E.g.
4.1 Generalized Linear Models
versus
Generalized Linear Models


On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com wrote:

 Subject: Re: [Scikit-learn-general] Is there a pdf documentation for the 
latest stable scikit-learn?
 To: scikit-learn-general@lists.sourceforge.net
 Date: Wednesday, April 15, 2015, 1:48 PM
 
 Hi.
 Yes,
 run make latexpdf in the doc
 folder.
 
 Best,
 Andy
 
 
 On 04/15/2015 01:11 PM, Tim wrote:
  Thanks, Andy!
 
  How do you generate the pdf file? Can I
 also do that?
 
 
 
  On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com
 wrote:
 
    Subject: Re:
 [Scikit-learn-general] Is there a pdf documentation for the
 latest stable scikit-learn?
    To: scikit-learn-general@lists.sourceforge.net
    Date: Wednesday, April 15,
 2015, 12:55 PM
    
    Hi Tim.
    There are pdfs for 0.16.0 and
 0.16.1 up now
    at
    
    http://sourceforge.net/projects/scikit-learn/files/documentation/
    
    Let us know if there are
    issues with it.
    
    Cheers,
    Andy
    
    
    On
    04/15/2015 12:08 PM, Tim
 wrote:
    
    Hello,
    
     I am
    looking for a pdf file for
 the documentation for the latest
    stable scikit-learn i.e.
 0.16.1.
    
     I followed
 http://scikit-learn.org/stable/support.html#documentation-resources,
    which leads me to
 http://sourceforge.net/projects/scikit-learn/files/documentation/,
    But the pdf files are for
 = 0.12 version and no
    later than 2012.
    
    
    Can the official team make
 the pdf files available?
    
     Thanks!
    
    
    
 --
     BPM Camp - Free Virtual
 Workshop May 6th
    at 10am
 PDT/1PM EDT
     Develop
 your own
    process in
 accordance with the BPMN 2 standard
     Learn Process modeling
 best practices with
    Bonita
 BPM through live exercises
     http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
    event?utm_
    
    source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
    
    ___
     Scikit-learn-general
 mailing list
     Scikit-learn-general@lists.sourceforge.net
     https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
    
    
    
 --
    BPM Camp - Free Virtual
 Workshop May 6th at
    10am
 PDT/1PM EDT
    Develop your
 own process in
    accordance
 with the BPMN 2 standard
    Learn
    Process modeling best
 practices with Bonita BPM through live
    exercises
    http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
    event?utm_
    source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
    ___
    Scikit-learn-general mailing
 list
    Scikit-learn-general@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
    
 
 
 --
  BPM Camp - Free Virtual Workshop May 6th
 at 10am PDT/1PM EDT
  Develop your own
 process in accordance with the BPMN 2 standard
  Learn Process modeling best practices with
 Bonita BPM through live exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 
 ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 
 
 --
 BPM Camp - Free Virtual Workshop May 6th at
 10am PDT/1PM EDT
 Develop your own process in
 accordance with the BPMN 2 standard
 Learn
 Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM 

Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Andreas Mueller
This is the sphinx latex build, not a script of ours.
I'm not sure, you can consult the sphinx documentation:
http://sphinx-doc.org/


On 04/16/2015 07:48 AM, Tim wrote:
 Thanks again!

 Can your scripts also create pdf bookmarks of third or lower levels?
 E.g.
 ...
 4.1.1 Ordinary Least Squares
 4.1.2 Ridge Regression
 Ridge Complexity
 Setting the regularization parameter: generalized Cross-Validation
 4.1.3 Lasso
 Setting regularization parameter
 Using cross-validation
 Information-criteria based model selection
 4.1.4 Elastic Net
 ...

 Can we also show the numerical numberings in pdf bookmarks?
 E.g.
 4.1 Generalized Linear Models
 versus
 Generalized Linear Models

 
 On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com wrote:

   Subject: Re: [Scikit-learn-general] Is there a pdf documentation for the 
 latest stable scikit-learn?
   To: scikit-learn-general@lists.sourceforge.net
   Date: Wednesday, April 15, 2015, 1:48 PM
   
   Hi.
   Yes,
   run make latexpdf in the doc
   folder.
   
   Best,
   Andy
   
   
   On 04/15/2015 01:11 PM, Tim wrote:
Thanks, Andy!
   
How do you generate the pdf file? Can I
   also do that?
   
   
   
On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com
   wrote:
   
  Subject: Re:
   [Scikit-learn-general] Is there a pdf documentation for the
   latest stable scikit-learn?
  To: scikit-learn-general@lists.sourceforge.net
  Date: Wednesday, April 15,
   2015, 12:55 PM
   
  Hi Tim.
  There are pdfs for 0.16.0 and
   0.16.1 up now
  at
   
  http://sourceforge.net/projects/scikit-learn/files/documentation/
   
  Let us know if there are
  issues with it.
   
  Cheers,
  Andy
   
   
  On
  04/15/2015 12:08 PM, Tim
   wrote:
  
  Hello,
  
   I am
  looking for a pdf file for
   the documentation for the latest
  stable scikit-learn i.e.
   0.16.1.
  
   I followed
   http://scikit-learn.org/stable/support.html#documentation-resources,
  which leads me to
   http://sourceforge.net/projects/scikit-learn/files/documentation/,
  But the pdf files are for
   = 0.12 version and no
  later than 2012.
  
  
  Can the official team make
   the pdf files available?
  
   Thanks!
  
  
  
 --
   BPM Camp - Free Virtual
   Workshop May 6th
  at 10am
   PDT/1PM EDT
   Develop
   your own
  process in
   accordance with the BPMN 2 standard
   Learn Process modeling
   best practices with
  Bonita
   BPM through live exercises
   http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
  event?utm_
  
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  
  ___
   Scikit-learn-general
   mailing list
   Scikit-learn-general@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
   
   
  
 --
  BPM Camp - Free Virtual
   Workshop May 6th at
  10am
   PDT/1PM EDT
  Develop your
   own process in
  accordance
   with the BPMN 2 standard
  Learn
  Process modeling best
   practices with Bonita BPM through live
  exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
  event?utm_
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
  Scikit-learn-general mailing
   list
  Scikit-learn-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
   
   
   
   
 --
BPM Camp - Free Virtual Workshop May 6th
   at 10am PDT/1PM EDT
Develop your own
   process in accordance with the BPMN 2 standard
Learn Process modeling best practices with
   Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
   event?utm_
   
   source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
   
   ___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
   
   
   
 --
   BPM Camp - Free Virtual Workshop May 6th at
   10am PDT/1PM EDT
   Develop your own process in
   accordance with the BPMN 2 standard
   Learn
   Process modeling best practices with Bonita BPM through live
   exercises
   http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
   event?utm_
   

Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Andreas Mueller

Interestingly, this time I didn't get any errors (I got them before).
But you get a pdf even with the errors.


On 04/16/2015 06:26 AM, Joel Nothman wrote:
Although I note that I've got LaTeX compilation errors, so I'm not 
sure how Andy compiles this.


On 16 April 2015 at 20:25, Joel Nothman joel.noth...@gmail.com 
mailto:joel.noth...@gmail.com wrote:


I've proposed a better chapter ordering at
https://github.com/scikit-learn/scikit-learn/pull/4602...

On 16 April 2015 at 03:48, Andreas Mueller t3k...@gmail.com
mailto:t3k...@gmail.com wrote:

Hi.
Yes, run make latexpdf in the doc folder.

Best,
Andy


On 04/15/2015 01:11 PM, Tim wrote:
 Thanks, Andy!

 How do you generate the pdf file? Can I also do that?

 
 On Wed, 4/15/15, Andreas Mueller t3k...@gmail.com
mailto:t3k...@gmail.com wrote:

   Subject: Re: [Scikit-learn-general] Is there a pdf
documentation for the latest stable scikit-learn?
   To: scikit-learn-general@lists.sourceforge.net
mailto:scikit-learn-general@lists.sourceforge.net
   Date: Wednesday, April 15, 2015, 12:55 PM

   Hi Tim.
   There are pdfs for 0.16.0 and 0.16.1 up now
   at


http://sourceforge.net/projects/scikit-learn/files/documentation/

   Let us know if there are
   issues with it.

   Cheers,
   Andy


   On
   04/15/2015 12:08 PM, Tim wrote:
   
   Hello,
   
I am
   looking for a pdf file for the documentation for the latest
   stable scikit-learn i.e. 0.16.1.
   
I followed
http://scikit-learn.org/stable/support.html#documentation-resources,
   which leads me to
http://sourceforge.net/projects/scikit-learn/files/documentation/,
   But the pdf files are for = 0.12 version and no
   later than 2012.
   
   
   Can the official team make the pdf files available?
   
Thanks!
   
   

 
--
BPM Camp - Free Virtual Workshop May 6th
   at 10am PDT/1PM EDT
Develop your own
   process in accordance with the BPMN 2 standard
Learn Process modeling best practices with
   Bonita BPM through live exercises
   
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
   event?utm_
   

 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
   
  ___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
mailto:Scikit-learn-general@lists.sourceforge.net
   
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



 
--
   BPM Camp - Free Virtual Workshop May 6th at
   10am PDT/1PM EDT
   Develop your own process in
   accordance with the BPMN 2 standard
   Learn
   Process modeling best practices with Bonita BPM through live
   exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
   event?utm_

 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
   Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
mailto:Scikit-learn-general@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




--
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM
through live exercises

http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
event?utm_

source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 Scikit-learn-general mailing list
 Scikit-learn-general@lists.sourceforge.net
mailto:Scikit-learn-general@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



--
BPM Camp - Free Virtual Workshop May 6th at 

Re: [Scikit-learn-general] Robust PCA

2015-04-16 Thread Alex Papanicolaou
How about something like this:
1.  Basic implementation of ALM uses arpack (not ideal but it means sklearn
can have RPCA available)
2.  Option to use randomized SVD if desired
3.  Option to use propack if desired and it's available (or if/when scipy
begins to use it)
4.  GoDec implementation for low rank + sparse + noise




On Wed, Apr 15, 2015 at 4:06 PM, 
scikit-learn-general-requ...@lists.sourceforge.net wrote:

 Send Scikit-learn-general mailing list submissions to
 scikit-learn-general@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 or, via email, send a message with subject or body 'help' to
 scikit-learn-general-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 scikit-learn-general-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Scikit-learn-general digest...


 Today's Topics:

1. Re: Scikit-learn-general Digest, Vol 63,  Issue 34
   (Alex Papanicolaou)
2. Re: Robust PCA (Olivier Grisel)
3. Re: Robust PCA (Kyle Kastner)
4. Re: Robust PCA (Yogesh Karpate)
5. Re: Performance of LSHForest (Joel Nothman)


 --

 Message: 1
 Date: Wed, 15 Apr 2015 11:22:17 -0700
 From: Alex Papanicolaou alex.papa...@gmail.com
 Subject: Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol
 63, Issue 34
 To: scikit-learn-general@lists.sourceforge.net
 Message-ID:
 CAGNPn4qTmTXOgpLX=
 ziqapuv5b29iecvfrfwpo96rnectww...@mail.gmail.com
 Content-Type: text/plain; charset=utf-8

 Kyle  Andreas,

 Here is my github repo:
 https://github.com/apapanico/RPCA

 Responses:
 1. I didn't make the GSoC suggestion a few years (also not a student
 anymore :-(, just using RPCA for work), I just came across it in a google
 search when trying to find python implementations.
 2. As for GoDec, I have not poked around with it but I would like to.  I
 had intended to use this as a starting point:
 https://sites.google.com/site/godecomposition/home
 But yea, it sounds like it can go much bigger.   But if I'm not mistaken,
 it's technically a different problem (low rank + sparse + noise).
 3. Regarding PROPACK, the main routine needed is lansvd which implements
 Lanczos bidiagonalization with partial reorthogonalization.  I do not know
 what else that depends on.  I also do not know if there's an implementation
 in C which would be preferred, obviously.  A routine for computing only
 top-k singular triplets is pretty key for making Candes' ALM method as
 efficient as possible.  Along these lines, I started out using the
 randomized SVD from sklearn but I was failing my tests generated with the
 original Matlab code so I switched to numpy svd and then finally svdp in
 pypropack.

 Cheers,
 Alex
 -- next part --
 An HTML attachment was scrubbed...

 --

 Message: 2
 Date: Wed, 15 Apr 2015 15:40:33 -0400
 From: Olivier Grisel olivier.gri...@ensta.org
 Subject: Re: [Scikit-learn-general] Robust PCA
 To: scikit-learn-general scikit-learn-general@lists.sourceforge.net
 Message-ID:
 CAFvE7K60pn7-YP7rfreFU932on8omr7=q8-Vxsf0a+=
 v_nt...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8

 We could use PyPROPACK if it was contributed upstream in scipy ;)

 I know that some scipy maintainers don't appreciate arpack much and
 would like to see it replaced (or at least completed with propack).

 --
 Olivier



 --

 Message: 3
 Date: Wed, 15 Apr 2015 15:51:01 -0400
 From: Kyle Kastner kastnerk...@gmail.com
 Subject: Re: [Scikit-learn-general] Robust PCA
 To: scikit-learn-general@lists.sourceforge.net
 Message-ID:
 CAGNZ19AqUxUV3So_pQ2vn=hDQzMkD4Wgodm6uwTUWAZbomx=_
 g...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8

 IF it was in scipy would it be backported to the older versions? How
 would we handle that?

 On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel
 olivier.gri...@ensta.org wrote:
  We could use PyPROPACK if it was contributed upstream in scipy ;)
 
  I know that some scipy maintainers don't appreciate arpack much and
  would like to see it replaced (or at least completed with propack).
 
  --
  Olivier
 
 
 --
  BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
  Develop your own process in accordance with the BPMN 2 standard
  Learn Process modeling best practices with Bonita BPM through live
 exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
  Scikit-learn-general mailing list
  Scikit-learn-general@lists.sourceforge.net
  

Re: [Scikit-learn-general] Performance of LSHForest

2015-04-16 Thread Daniel Vainsencher
Hi Joel,

To extend your analysis:
- when n_samples*n_indices is large enough, the bottleneck is the use of 
the index, as you say.
- when n_dimensions*n_candidates is large enough, the bottleneck is 
computation of true distances between DB points and the query.

To serve well both kinds of use cases is perfectly possible, but 
requires use of the index that is both:
A) Fast
B) Uses the index optimally to reduce the number of candidates for which 
we compare distances.

Here is a variant of your proposal (better keep track of context) that 
also requires a little Cython but improves both aspects A and B and 
reduces code complexity.

Observation I:
Only a single binary search per index is necessary, the first. After we 
find the correct location for the query binary code, we can restrict 
ourselves to the n_candidates (or even fewer) before and after that 
location.

So no further binary searches are necessary at all, and the restriction 
to a small linear part of the array should be much more cache friendly. 
This makes full use of our array implementation of orderedcollection, 
instead of acting as if we were still on a binary tree implementation as 
in the original LSH-Forest paper.

There is a price to pay for this simplification: we are now looking at 
(computing full distance from query for) 2*n_candidates*n_indices 
points, which can be expensive (we improved A at a cost to B).

But here is where some Cython can be really useful. Observation II:
The best information we can extract from the binary representation is 
not the distances in the tree structure, but hamming distances to the query.

So after the restriction of I, compute the *hamming distances* of the 
2*n_candidate*n_indices points each from the binary representation of 
the query (corresponding to the appropriate index). Then compute full 
metric only for the n_candidates with the lowest hamming distances.

This should achieve a pretty good sweet spot of performance, with just a 
bit of Cython.

Daniel

On 04/16/2015 12:18 AM, Joel Nothman wrote:
 Once we're dealing with large enough index and n_candidates, most time
 is spent in searchsorted in the synchronous ascending phase, while any
 overhead around it is marginal. Currently we are searching over the
 whole array in each searchsorted, while it could be rewritten to keep
 better track of context to cut down the overall array when searching.
 While possible, I suspect this will look confusing in Python/Numpy, and
 Cython will be a clearer and faster way to present this logic.

 On the other hand, time spent in _compute_distances is substantial, and
 yet most of its consumption is /outside/ of pairwise_distances. This
 commit
 https://github.com/scikit-learn/scikit-learn/commit/c1f335f70aa0f766a930f8ac54eeaa601245725a
 cuts a basic benchmark from 85 to 70 seconds. Vote here for merge
 https://github.com/scikit-learn/scikit-learn/pull/4603!

 On 16 April 2015 at 12:32, Maheshakya Wijewardena
 pmaheshak...@gmail.com mailto:pmaheshak...@gmail.com wrote:

 Moreover, this drawback occurs because LSHForest does not vectorize
 multiple queries as in 'ball_tree' or any other method. This slows
 the exact neighbor distance calculation down significantly after
 approximation. This will not be a problem if queries are for
 individual points. Unfortunately, former is the more useful usage of
 LSHForest.
 Are you trying individual queries or multiple queries (for n_samples)?

 On Thu, Apr 16, 2015 at 6:14 AM, Daniel Vainsencher
 daniel.vainsenc...@gmail.com mailto:daniel.vainsenc...@gmail.com
 wrote:

 LHSForest is not intended for dimensions at which exact methods
 work well, nor for tiny datasets. Try d500, n_points10, I
 don't remember the switchover point.

 The documentation should make this clear, but unfortunately I
 don't see that it does.

 On Apr 15, 2015 7:08 PM, Joel Nothman joel.noth...@gmail.com
 mailto:joel.noth...@gmail.com wrote:

 I agree this is disappointing, and we need to work on making
 LSHForest faster. Portions should probably be coded in
 Cython, for instance, as the current implementation is a bit
 circuitous in order to work in numpy. PRs are welcome.

 LSHForest could use parallelism to be faster, but so can
 (and will) the exact neighbors methods. In theory in
 LSHForest, each tree could be stored on entirely different
 machines, providing memory benefits, but scikit-learn can't
 really take advantage of this.

 Having said that, I would also try with higher n_features
 and n_queries. We have to limit the scale of our examples in
 order to limit the overall document compilation time.

 On 16 April 2015 at 01:12, Miroslav Batchkarov
 mbatchka...@gmail.com mailto:mbatchka...@gmail.com wrote:

   

Re: [Scikit-learn-general] gradient boost classifier - feature_importances_

2015-04-16 Thread Pagliari, Roberto
never mind my question. I forgot gridsearch was the actual object.

Thanks,


From: Pagliari, Roberto [rpagli...@appcomsci.com]
Sent: Thursday, April 16, 2015 12:50 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: [Scikit-learn-general] gradient boost classifier - feature_importances_

is feature_importances_ available from gradient boosting?

it is mentioned in the documentation, but it doesn't exist when I try to access 
it (after fitting via grid search).

I printed 'dir' of the object and can't see it.

Thanks,
--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Performance of LSHForest

2015-04-16 Thread Joel Nothman
I more or less agree. Certainly we only need to do one searchsorted per
query per tree, and then do linear scans. There is a question of how close
we stay to the original LSHForest algorithm, which relies on matching
prefixes rather than hamming distance. Hamming distance is easier to
calculate in NumPy and is probably faster to calculate in C too (with or
without using POPCNT). Perhaps the only advantage of using Cython in your
solution is to avoid the memory overhead of unpackbits.

However, n_candidates before and after is arguably not sufficient if one
side has more than n_candidates with a high prefix overlap. But until we
look at the suffixes we can't know if it is closer or farther in hamming
distance.

I also think the use of n_candidates in the current code is somewhat
broken, as suggested by my XXX comment in _get_candidates, which we
discussed but did not resolve clearly. I think it will be hard to make
improvements of this sort without breaking the current results and
parameter sensitivities of the implementation.

On 17 April 2015 at 00:16, Daniel Vainsencher daniel.vainsenc...@gmail.com
wrote:

 Hi Joel,

 To extend your analysis:
 - when n_samples*n_indices is large enough, the bottleneck is the use of
 the index, as you say.
 - when n_dimensions*n_candidates is large enough, the bottleneck is
 computation of true distances between DB points and the query.

 To serve well both kinds of use cases is perfectly possible, but
 requires use of the index that is both:
 A) Fast
 B) Uses the index optimally to reduce the number of candidates for which
 we compare distances.

 Here is a variant of your proposal (better keep track of context) that
 also requires a little Cython but improves both aspects A and B and
 reduces code complexity.

 Observation I:
 Only a single binary search per index is necessary, the first. After we
 find the correct location for the query binary code, we can restrict
 ourselves to the n_candidates (or even fewer) before and after that
 location.

 So no further binary searches are necessary at all, and the restriction
 to a small linear part of the array should be much more cache friendly.
 This makes full use of our array implementation of orderedcollection,
 instead of acting as if we were still on a binary tree implementation as
 in the original LSH-Forest paper.

 There is a price to pay for this simplification: we are now looking at
 (computing full distance from query for) 2*n_candidates*n_indices
 points, which can be expensive (we improved A at a cost to B).

 But here is where some Cython can be really useful. Observation II:
 The best information we can extract from the binary representation is
 not the distances in the tree structure, but hamming distances to the
 query.

 So after the restriction of I, compute the *hamming distances* of the
 2*n_candidate*n_indices points each from the binary representation of
 the query (corresponding to the appropriate index). Then compute full
 metric only for the n_candidates with the lowest hamming distances.

 This should achieve a pretty good sweet spot of performance, with just a
 bit of Cython.

 Daniel

 On 04/16/2015 12:18 AM, Joel Nothman wrote:
  Once we're dealing with large enough index and n_candidates, most time
  is spent in searchsorted in the synchronous ascending phase, while any
  overhead around it is marginal. Currently we are searching over the
  whole array in each searchsorted, while it could be rewritten to keep
  better track of context to cut down the overall array when searching.
  While possible, I suspect this will look confusing in Python/Numpy, and
  Cython will be a clearer and faster way to present this logic.
 
  On the other hand, time spent in _compute_distances is substantial, and
  yet most of its consumption is /outside/ of pairwise_distances. This
  commit
  
 https://github.com/scikit-learn/scikit-learn/commit/c1f335f70aa0f766a930f8ac54eeaa601245725a
 
  cuts a basic benchmark from 85 to 70 seconds. Vote here for merge
  https://github.com/scikit-learn/scikit-learn/pull/4603!
 
  On 16 April 2015 at 12:32, Maheshakya Wijewardena
  pmaheshak...@gmail.com mailto:pmaheshak...@gmail.com wrote:
 
  Moreover, this drawback occurs because LSHForest does not vectorize
  multiple queries as in 'ball_tree' or any other method. This slows
  the exact neighbor distance calculation down significantly after
  approximation. This will not be a problem if queries are for
  individual points. Unfortunately, former is the more useful usage of
  LSHForest.
  Are you trying individual queries or multiple queries (for
 n_samples)?
 
  On Thu, Apr 16, 2015 at 6:14 AM, Daniel Vainsencher
  daniel.vainsenc...@gmail.com mailto:daniel.vainsenc...@gmail.com
  wrote:
 
  LHSForest is not intended for dimensions at which exact methods
  work well, nor for tiny datasets. Try d500, n_points10, I
  don't remember the switchover point.
 
 

Re: [Scikit-learn-general] Performance of LSHForest

2015-04-16 Thread Daniel Vainsencher
On 04/16/2015 05:49 PM, Joel Nothman wrote:
 I more or less agree. Certainly we only need to do one searchsorted per
 query per tree, and then do linear scans. There is a question of how
 close we stay to the original LSHForest algorithm, which relies on
 matching prefixes rather than hamming distance. Hamming distance is
 easier to calculate in NumPy and is probably faster to calculate in C
 too (with or without using POPCNT). Perhaps the only advantage of using
 Cython in your solution is to avoid the memory overhead of unpackbits.
You obviously know more than I do about Cython vs numpy options.

 However, n_candidates before and after is arguably not sufficient if one
 side has more than n_candidates with a high prefix overlap.
I disagree. Being able to look at 2*n_candidates that must contain 
n_candidates of the closest ones, rather than as many as happen to 
agree on x number of bits is a feature, not a bug. Especially if we 
then filter them by hamming distance.


 But until we
 look at the suffixes we can't know if it is closer or farther in hamming
 distance.

 I also think the use of n_candidates in the current code is somewhat
 broken, as suggested by my XXX comment in _get_candidates, which we
 discussed but did not resolve clearly.
I think n_candidates embodies a reasonable desire to be able to set the 
investment per query, but there are many ways to do that. Since the 
research does not provide very good advice (I think the guarantees 
require looking at sqrt(db size) candidates for every query), it leaves 
the field wide open for different schemes. Should the number of 
distances calculated be per number of indices, or fixed? etc.

I am proposing a particular definition:
- Read 2*n_candidates*n_indices hashes
- Calculate n_candidates full distances.
The motivation for the first is that it allows us to get all 
n_candidates from the same side of the same index if that is where the 
good stuff (hamming distance-wise) is, and seems not-too-expensive in 
calculating hamming distances. But if someone argues we should calculate 
10x as many hamming distances to decide what distances to compute, I 
don't know a very good argument one way or the other.

 I think it will be hard to make
 improvements of this sort without breaking the current results and
 parameter sensitivities of the implementation.
You seem to be assuming it is tuned; I am not even sure there exists a 
precise sense in which it is tunable, except for a particular dataset 
(and that is not very useful) :)

Daniel



 On 17 April 2015 at 00:16, Daniel Vainsencher
 daniel.vainsenc...@gmail.com mailto:daniel.vainsenc...@gmail.com wrote:

 Hi Joel,

 To extend your analysis:
 - when n_samples*n_indices is large enough, the bottleneck is the use of
 the index, as you say.
 - when n_dimensions*n_candidates is large enough, the bottleneck is
 computation of true distances between DB points and the query.

 To serve well both kinds of use cases is perfectly possible, but
 requires use of the index that is both:
 A) Fast
 B) Uses the index optimally to reduce the number of candidates for which
 we compare distances.

 Here is a variant of your proposal (better keep track of context) that
 also requires a little Cython but improves both aspects A and B and
 reduces code complexity.

 Observation I:
 Only a single binary search per index is necessary, the first. After we
 find the correct location for the query binary code, we can restrict
 ourselves to the n_candidates (or even fewer) before and after that
 location.

 So no further binary searches are necessary at all, and the restriction
 to a small linear part of the array should be much more cache friendly.
 This makes full use of our array implementation of orderedcollection,
 instead of acting as if we were still on a binary tree implementation as
 in the original LSH-Forest paper.

 There is a price to pay for this simplification: we are now looking at
 (computing full distance from query for) 2*n_candidates*n_indices
 points, which can be expensive (we improved A at a cost to B).

 But here is where some Cython can be really useful. Observation II:
 The best information we can extract from the binary representation is
 not the distances in the tree structure, but hamming distances to
 the query.

 So after the restriction of I, compute the *hamming distances* of the
 2*n_candidate*n_indices points each from the binary representation of
 the query (corresponding to the appropriate index). Then compute full
 metric only for the n_candidates with the lowest hamming distances.

 This should achieve a pretty good sweet spot of performance, with just a
 bit of Cython.

 Daniel

 On 04/16/2015 12:18 AM, Joel Nothman wrote:
  Once we're dealing with large enough index and n_candidates, most time
  is spent in searchsorted in the 

Re: [Scikit-learn-general] Robust PCA

2015-04-16 Thread Kyle Kastner
GoDec might not have the citations (yet) to be added to scikit-learn.
But I think a basic ALM based RPCA would be a great addition, along
with a cool demo. Background smart background subtraction would be my
vote but might be too heavy weight - I could see a cool example of
something like colored bouncing balls overlaid on the china picture
that is built in for sklearn.


On Thu, Apr 16, 2015 at 1:18 PM, Alex Papanicolaou
alex.papa...@gmail.com wrote:
 How about something like this:
 1.  Basic implementation of ALM uses arpack (not ideal but it means sklearn
 can have RPCA available)
 2.  Option to use randomized SVD if desired
 3.  Option to use propack if desired and it's available (or if/when scipy
 begins to use it)
 4.  GoDec implementation for low rank + sparse + noise




 On Wed, Apr 15, 2015 at 4:06 PM,
 scikit-learn-general-requ...@lists.sourceforge.net wrote:

 Send Scikit-learn-general mailing list submissions to
 scikit-learn-general@lists.sourceforge.net

 To subscribe or unsubscribe via the World Wide Web, visit
 https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
 or, via email, send a message with subject or body 'help' to
 scikit-learn-general-requ...@lists.sourceforge.net

 You can reach the person managing the list at
 scikit-learn-general-ow...@lists.sourceforge.net

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of Scikit-learn-general digest...


 Today's Topics:

1. Re: Scikit-learn-general Digest, Vol 63,  Issue 34
   (Alex Papanicolaou)
2. Re: Robust PCA (Olivier Grisel)
3. Re: Robust PCA (Kyle Kastner)
4. Re: Robust PCA (Yogesh Karpate)
5. Re: Performance of LSHForest (Joel Nothman)


 --

 Message: 1
 Date: Wed, 15 Apr 2015 11:22:17 -0700
 From: Alex Papanicolaou alex.papa...@gmail.com
 Subject: Re: [Scikit-learn-general] Scikit-learn-general Digest, Vol
 63, Issue 34
 To: scikit-learn-general@lists.sourceforge.net
 Message-ID:

 CAGNPn4qTmTXOgpLX=ziqapuv5b29iecvfrfwpo96rnectww...@mail.gmail.com
 Content-Type: text/plain; charset=utf-8

 Kyle  Andreas,

 Here is my github repo:
 https://github.com/apapanico/RPCA

 Responses:
 1. I didn't make the GSoC suggestion a few years (also not a student
 anymore :-(, just using RPCA for work), I just came across it in a google
 search when trying to find python implementations.
 2. As for GoDec, I have not poked around with it but I would like to.  I
 had intended to use this as a starting point:
 https://sites.google.com/site/godecomposition/home
 But yea, it sounds like it can go much bigger.   But if I'm not mistaken,
 it's technically a different problem (low rank + sparse + noise).
 3. Regarding PROPACK, the main routine needed is lansvd which implements
 Lanczos bidiagonalization with partial reorthogonalization.  I do not know
 what else that depends on.  I also do not know if there's an
 implementation
 in C which would be preferred, obviously.  A routine for computing only
 top-k singular triplets is pretty key for making Candes' ALM method as
 efficient as possible.  Along these lines, I started out using the
 randomized SVD from sklearn but I was failing my tests generated with the
 original Matlab code so I switched to numpy svd and then finally svdp in
 pypropack.

 Cheers,
 Alex
 -- next part --
 An HTML attachment was scrubbed...

 --

 Message: 2
 Date: Wed, 15 Apr 2015 15:40:33 -0400
 From: Olivier Grisel olivier.gri...@ensta.org
 Subject: Re: [Scikit-learn-general] Robust PCA
 To: scikit-learn-general scikit-learn-general@lists.sourceforge.net
 Message-ID:

 CAFvE7K60pn7-YP7rfreFU932on8omr7=q8-Vxsf0a+=v_nt...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8

 We could use PyPROPACK if it was contributed upstream in scipy ;)

 I know that some scipy maintainers don't appreciate arpack much and
 would like to see it replaced (or at least completed with propack).

 --
 Olivier



 --

 Message: 3
 Date: Wed, 15 Apr 2015 15:51:01 -0400
 From: Kyle Kastner kastnerk...@gmail.com
 Subject: Re: [Scikit-learn-general] Robust PCA
 To: scikit-learn-general@lists.sourceforge.net
 Message-ID:

 CAGNZ19AqUxUV3So_pQ2vn=hDQzMkD4Wgodm6uwTUWAZbomx=_...@mail.gmail.com
 Content-Type: text/plain; charset=UTF-8

 IF it was in scipy would it be backported to the older versions? How
 would we handle that?

 On Wed, Apr 15, 2015 at 3:40 PM, Olivier Grisel
 olivier.gri...@ensta.org wrote:
  We could use PyPROPACK if it was contributed upstream in scipy ;)
 
  I know that some scipy maintainers don't appreciate arpack much and
  would like to see it replaced (or at least completed with propack).
 
  --
  Olivier
 
 
  --
  BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
  Develop your