Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Peter Prettenhofer
2012/1/9 Peter Prettenhofer : > 2012/1/8 Mathieu Blondel : >> If I'm not mistaken (I just read the source code on github), the copy >> that Peter is experiencing is due to ravel() in this method: >> https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264 >> >> This method in turn

Re: [Scikit-learn-general] Publishing the HTML doc on github

2012-01-08 Thread Andreas Mueller
On 01/09/2012 01:59 AM, Olivier Grisel wrote: > Hi all, > > As discussed earlier, here is a new tool to publish the doc on github > rather than sourceforge. > > The result is available here: > >https://github.com/scikit-learn/scikit-learn.org (the repo for the > tool, basically a README.rst and

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Peter Prettenhofer
2012/1/8 Mathieu Blondel : > If I'm not mistaken (I just read the source code on github), the copy > that Peter is experiencing is due to ravel() in this method: > https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264 > > This method in turn invokes csr_matvecs which is impleme

[Scikit-learn-general] Publishing the HTML doc on github

2012-01-08 Thread Olivier Grisel
Hi all, As discussed earlier, here is a new tool to publish the doc on github rather than sourceforge. The result is available here: https://github.com/scikit-learn/scikit-learn.org (the repo for the tool, basically a README.rst and a Makefile) http://scikit-learn.github.com/scikit-learn.org

Re: [Scikit-learn-general] Putting SVC and NuSVC into the same class

2012-01-08 Thread Andreas
On 01/08/2012 11:29 PM, Olivier Grisel wrote: > 2012/1/8 Andreas: > >> Hey everybody. >> @larsmans (my personal hero for the day) started refactoring the SVM >> class structure here: >> https://github.com/larsmans/scikit-learn/commits/refactor-svm >> after some discussion here: >> https://githu

Re: [Scikit-learn-general] Putting SVC and NuSVC into the same class

2012-01-08 Thread Olivier Grisel
2012/1/8 Vlad Niculae : > A bit off topic but since we're talking about work on the SVM module, I > noticed something wrong with the docs. > > http://scikit-learn.org/dev/modules/svm.html#tips-on-practical-use > > The scaling part makes reference to some "Cookbook" (I don't know what this > is, i

Re: [Scikit-learn-general] Putting SVC and NuSVC into the same class

2012-01-08 Thread Vlad Niculae
A bit off topic but since we're talking about work on the SVM module, I noticed something wrong with the docs. http://scikit-learn.org/dev/modules/svm.html#tips-on-practical-use The scaling part makes reference to some "Cookbook" (I don't know what this is, it probably died before I joined you

Re: [Scikit-learn-general] Putting SVC and NuSVC into the same class

2012-01-08 Thread Olivier Grisel
2012/1/8 Andreas : > Hey everybody. > @larsmans (my personal hero for the day) started refactoring the SVM > class structure here: > https://github.com/larsmans/scikit-learn/commits/refactor-svm > after some discussion here: > https://github.com/scikit-learn/scikit-learn/issues/253 > and somewhat r

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Olivier Grisel
2012/1/8 Vlad Niculae : >> >>> Olivier's solution sounds good. >> >> And it's easy to implement too :) @pprett can you confirm it solves >> your perf issue on your data? > > I'm talking without actually looking at the code but as long as after fit, > the array will only be needed in F-order, this

[Scikit-learn-general] Putting SVC and NuSVC into the same class

2012-01-08 Thread Andreas
Hey everybody. @larsmans (my personal hero for the day) started refactoring the SVM class structure here: https://github.com/larsmans/scikit-learn/commits/refactor-svm after some discussion here: https://github.com/scikit-learn/scikit-learn/issues/253 and somewhat related here: https://github.com

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Vlad Niculae
> >> Olivier's solution sounds good. > > And it's easy to implement too :) @pprett can you confirm it solves > your perf issue on your data? I'm talking without actually looking at the code but as long as after fit, the array will only be needed in F-order, this feels right. However afaik SGDC

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Olivier Grisel
2012/1/8 Mathieu Blondel : > If I'm not mistaken (I just read the source code on github), the copy > that Peter is experiencing is due to ravel() in this method: > https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264 > > This method in turn invokes csr_matvecs which is impleme

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Mathieu Blondel
If I'm not mistaken (I just read the source code on github), the copy that Peter is experiencing is due to ravel() in this method: https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py#L264 This method in turn invokes csr_matvecs which is implemented here: https://github.com/scipy/

Re: [Scikit-learn-general] Léon Bottou SGD version 2.0 is out: Averaged SGD

2012-01-08 Thread Olivier Grisel
In order not to forget about this thread I have open an issue: https://github.com/scikit-learn/scikit-learn/issues/543 -- Olivier -- Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex infrastruc

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Olivier Grisel
If the only change would be to do a: self.coef_ = np.asfortranarray(coef_) at the end of the fit method of the SGDClassifier and SGDRegressor then I am all for it. We should just check that this indeed solves the memory copy issue you suspect. -- Olivier --

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Gael Varoquaux
On Sun, Jan 08, 2012 at 08:23:40PM +0100, Peter Prettenhofer wrote: > thus, we have to change the memory layout of > `coef_` from c to fortran style? This is an interesting solution. Its a double-edged sword: depending on what is done with coef_ C or Fortran order will be preferred. If we actuall

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Peter Prettenhofer
Thanks Gael for pointing this out. The question is how to best deal with it - we can/should not change the layout of our `coef_` attributes from (n_classes, n_features) to (n_features, n_classes), thus, we have to change the memory layout of `coef_` from c to fortran style? 2012/1/8 Gael Varoqua

Re: [Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Gael Varoquaux
Yes, it is a common situation that fortran vs C ordering makes a huge difference in various computations. I optimized the online dictionary learning algorithm quite heavily based on those principles. I am sure that they are other low hanging fruits. Maybe we need a list of different opitmization t

[Scikit-learn-general] test time performance and sparse - dense dot products

2012-01-08 Thread Peter Prettenhofer
Hi, I recently used `SGDClassifier`** in a setting (text classification) where test time performance is critical and classification decisions are made for a single data point at a time. After doing some profiling I was quite surprised to find out that the performance bottleneck was _not_ feature p