Re: [Scikit-learn-general] Implementing n_jobs for OneVsRestClassifier

2012-11-28 Thread Mathieu Blondel
On Thu, Nov 29, 2012 at 10:39 AM, Afik Cohen wrote: > > It's easy to see how with some slight modifications (wrapping that in a > joblib > Parallel() call) we could enable n_jobs for OneVsRestClassifier. This > almost > seems too simple, so there must be a good reason why this isn't done; could >

[Scikit-learn-general] Implementing n_jobs for OneVsRestClassifier

2012-11-28 Thread Afik Cohen
Hi all, We've been looking at ways to parallelize our classifier training, and we looked at the n_jobs parameter as a possible way to do that. The classifier we're currently using, SGDClassifier, supports that parameter, but since we're using a OneVsRest (Ovr) strategy, our call is wrapped in a O

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread by liu
Dear, sklearn community, 1. The source code of the Block SBL algorithm is now available at bitbucket: https://bitbucket.org/liubenyuan/pybsbl any suggestion, optimization and test on the code are all welcome! as well as your success stories on applying our methods. 2. Block-OMP is an extension to

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Gael Varoquaux
Hi Liu, This work is really nice and very fancy, but it is also very recent and needs a bit more insight and benchmarking before it can enter scikit-learn: we have a rule not to integrate any new approach that is more than 2 years old. The reason is that if the approach is to be a massive success,

Re: [Scikit-learn-general] OneVsRestClassifier Online learning

2012-11-28 Thread Ak
> Remember that one-vs-rest trains each base classifier *independently* (a new sample may play the role of a positive or negative example, depending on the class). Therefore, you do need to update all base classifiers.HTH,Mathieu > > > Ah, so every sample is trained on every classifier? It

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Paolo Losi
On Wed, Nov 28, 2012 at 5:02 PM, Peter Prettenhofer < [email protected]> wrote: > > if the input data is float64 you need to take conversion to float32 > into account; furthermore sklearn will convert to fortran layout -> > this will give a huge penalty in memory consumption. > In my e

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Olivier Grisel
> scikit-learn is really bad when n_jobs=10. To avoid the memory copy you can try my branch of joblib: https://github.com/joblib/joblib/pull/44 You need to hack the X_argsorted generation code to generate a memmap array instead of a numpy array (I am planning to add a helper in joblib to make th

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Peter Prettenhofer
2012/11/28 Mathieu Blondel : > scikit-learn's RF is entirely written in Python (forest.py) so there may > still be some slow code paths. Moreover, their parallel implementation is > probably written with pthreads or OpenMP so they bypass the problems that we > have with Python's multiprocessing mod

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Mathieu Blondel
On Thu, Nov 29, 2012 at 12:50 AM, Andreas Mueller wrote: > Why should C++ be any faster than Cython? > Templating number of bins in leafs? > scikit-learn's RF is entirely written in Python (forest.py) so there may still be some slow code paths. Moreover, their parallel implementation is probably

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Peter Prettenhofer
2012/11/28 Andreas Mueller : > Am 28.11.2012 16:46, schrieb Mathieu Blondel: > > > > On Thu, Nov 29, 2012 at 12:33 AM, Andreas Mueller > wrote: >> >> Do you see where the "sometimes 100x" comes from? >> Not from what he demonstrates, right? >> > scikit-learn is really bad when n_jobs=10. I would b

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Andreas Mueller
Am 28.11.2012 16:46, schrieb Mathieu Blondel: On Thu, Nov 29, 2012 at 12:33 AM, Andreas Mueller mailto:[email protected]>> wrote: Do you see where the "sometimes 100x" comes from? Not from what he demonstrates, right? scikit-learn is really bad when n_jobs=10. I would be inte

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Mathieu Blondel
On Thu, Nov 29, 2012 at 12:33 AM, Andreas Mueller wrote: > Do you see where the "sometimes 100x" comes from? > Not from what he demonstrates, right? > > scikit-learn is really bad when n_jobs=10. I would be interested in knowing if the performance gains are mostly coming from the fact that wiseRF

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Gilles Louppe
Nope they don't... On 28 November 2012 16:39, Andreas Mueller wrote: > Am 28.11.2012 16:33, schrieb Gilles Louppe: >> Do they use the same value for the min_samples_split parameter? I see >> they use a default value (hidden in their constructor I guess), but >> theirs might not be the same as our

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Andreas Mueller
Am 28.11.2012 16:33, schrieb Gilles Louppe: > Do they use the same value for the min_samples_split parameter? I see > they use a default value (hidden in their constructor I guess), but > theirs might not be the same as ours. > They don't even give the depth, do they? -

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Gilles Louppe
Do they use the same value for the min_samples_split parameter? I see they use a default value (hidden in their constructor I guess), but theirs might not be the same as ours. Gilles On 28 November 2012 16:29, Andreas Mueller wrote: > Am 28.11.2012 16:19, schrieb Peter Prettenhofer: >> Some more

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Andreas Mueller
Am 28.11.2012 16:19, schrieb Peter Prettenhofer: > Some more benchmarks from wise.io: > > http://continuum.io/blog/wiserf-use-cases-and-benchmarks > > quite impressive indeed - unfortunately I cannot post any comments on > the blog - I wonder if they use some sort of binned split evaluation > [1] i

Re: [Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Andreas Mueller
Am 28.11.2012 16:19, schrieb Peter Prettenhofer: > Some more benchmarks from wise.io: > > http://continuum.io/blog/wiserf-use-cases-and-benchmarks > > quite impressive indeed - unfortunately I cannot post any comments on > the blog - I wonder if they use some sort of binned split evaluation > [1] i

[Scikit-learn-general] Random forest benchmarks: wise.io vs. sklearn

2012-11-28 Thread Peter Prettenhofer
Some more benchmarks from wise.io: http://continuum.io/blog/wiserf-use-cases-and-benchmarks quite impressive indeed - unfortunately I cannot post any comments on the blog - I wonder if they use some sort of binned split evaluation [1] instead of exact split evaluation (wiseRF has slightly lower a

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Andreas Mueller
Am 28.11.2012 12:00, schrieb Olivier Grisel: > Also the pending patent of BSBL-BO applied to EEG decoding makes me a > lot less interested in working on maintaining an open-source version > of such method knowing that it could not be used without licensing the > patent in the U.S. > > http://techtr

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Olivier Grisel
Also the pending patent of BSBL-BO applied to EEG decoding makes me a lot less interested in working on maintaining an open-source version of such method knowing that it could not be used without licensing the patent in the U.S. http://techtransfer.universityofcalifornia.edu/NCD/22688.html --

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Olivier Grisel
2012/11/28 : > Dear scikit-learn community: > > Block Sparse Bayesian Learning is a powerful CS algorithm for recovering > block sparse signals with structures, and shows the additional benefits of > reconstruct non-sparse signals, see Dr. zhilin zhang's websites: > http://dsp.ucsd.edu/~zhilin/BSB

Re: [Scikit-learn-general] Fwd: Precision / Recall curve - surprising output

2012-11-28 Thread Olivier Grisel
It looks like a bug. Can you please open a github issue (including your code snippet + the links to the predictions)? It's weird that this issue cannot be seen here: http://scikit-learn.org/dev/auto_examples/plot_roc_crossval.html#example-plot-roc-crossval-py -- Olivier

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Joly Arnaud
Hi, There is the orthogonal matching pursuit algorithm (an another CS algorithm) in scikit-learn which is classified as a regression model. The notations are a bit different from the CS community. Arnaud Joly Le 28/11/2012 08:38, [email protected] a écrit : Hi Meng: It is a compressive se

Re: [Scikit-learn-general] Block sparse Bayesian learning algorithm in python

2012-11-28 Thread Andreas Mueller
Dear Liu Benuyan. Thank you for offering to contribute to scikit-learn. I am no expert in sparse signal recovery and/or matrix factorization, so I can not really comment on the method. I just wanted to mention that we include mostly widely-used or classical algorithms. I am not sure how far this