Re: [Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Lars Buitinck
2013/2/23 Andreas Mueller : > I forgot you wanted also do include a new joblib, right? > That is in master, isn't it? Not sure if Lars already picked it. Yeah, I think I did. Maybe cherry-pick pprett's e6c85a36a6dd8a45cb8d7fcd6b8015e65af91e3d (sparse coef_ support for linear classifiers) as a hidd

Re: [Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Yaroslav Halchenko
and please include fixes for roc_curve (there were two commits). For now I have picked them up for the (Neuro)Debian package cheers On Fri, 22 Feb 2013, Andreas Mueller wrote: > Hey everybody. > So I plan to do a bugfix release based on Lars' 0.13 branch tomorrow. > I also want to includes Yaro

Re: [Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Andreas Mueller
On 02/23/2013 12:12 AM, Andreas Mueller wrote: > On 02/23/2013 12:05 AM, Gael Varoquaux wrote: >> On Fri, Feb 22, 2013 at 11:07:13PM +0100, Andreas Mueller wrote: >>> So I plan to do a bugfix release based on Lars' 0.13 branch tomorrow. >>> I also want to includes Yaroslavs train_test_split fix. >>

Re: [Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Andreas Mueller
On 02/23/2013 12:05 AM, Gael Varoquaux wrote: > On Fri, Feb 22, 2013 at 11:07:13PM +0100, Andreas Mueller wrote: >> So I plan to do a bugfix release based on Lars' 0.13 branch tomorrow. >> I also want to includes Yaroslavs train_test_split fix. > What's your schedule during the day? What's the rema

Re: [Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Gael Varoquaux
On Fri, Feb 22, 2013 at 11:07:13PM +0100, Andreas Mueller wrote: > So I plan to do a bugfix release based on Lars' 0.13 branch tomorrow. > I also want to includes Yaroslavs train_test_split fix. What's your schedule during the day? What's the remaining work to do? I'll try to pitch in. G ---

[Scikit-learn-general] Bugfix release 0.13.1

2013-02-22 Thread Andreas Mueller
Hey everybody. So I plan to do a bugfix release based on Lars' 0.13 branch tomorrow. I also want to includes Yaroslavs train_test_split fix. Anything else? Cheers, Andy -- Everyone hates slow websites. So do we. Make you

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Lars Buitinck
2013/2/22 Peter Prettenhofer : > http://xkcd.com/394/ Also http://xkcd.com/1000/ -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam -- Everyone hates slow websites. So do we. Make your web apps faster

Re: [Scikit-learn-general] feature selection & scoring

2013-02-22 Thread Andreas Mueller
On 02/22/2013 12:03 PM, Christian wrote: > Hi, > > when I train a classification model with feature selected data, I'll > need for future scoring issues the selector object and the model object. > So I'll must persist both ( i.e. with pickle ), right ? Yes. But the selector is just a mask of siz

[Scikit-learn-general] feature selection & scoring

2013-02-22 Thread Christian
Hi, when I train a classification model with feature selected data, I'll need for future scoring issues the selector object and the model object. So I'll must persist both ( i.e. with pickle ), right ? Many thanks Christian -

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Andreas Mueller
On 02/22/2013 11:39 AM, Andreas Mueller wrote: > I was just wondering: does the current l1 penalty implementation actually > lead to sparse coef_? > I though additional tricks were required for that. > If it is the case, maybe an example would be nice? > > Oh, ok, the implementation indeed yields s

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Peter Prettenhofer
http://xkcd.com/394/ 2013/2/22 Olivier Grisel : > 2013/2/22 Peter Prettenhofer : >> @ark: for 500K features and 3K classes your coef_ matrix will be: >> 50 * 3000 * 8 / 1024. / 1024. ~= 11GB > > Nitpicking, that will be: > > 50 * 3000 * 8 / 1024. / 1024. ~= 11GiB > > or: > > 50 * 3000

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Olivier Grisel
2013/2/22 Peter Prettenhofer : > @ark: for 500K features and 3K classes your coef_ matrix will be: > 50 * 3000 * 8 / 1024. / 1024. ~= 11GB Nitpicking, that will be: 50 * 3000 * 8 / 1024. / 1024. ~= 11GiB or: 50 * 3000 * 8 / 1e6. ~= 12GB But nearly everybody is making the mistake...

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Andreas Mueller
I was just wondering: does the current l1 penalty implementation actually lead to sparse coef_? I though additional tricks were required for that. If it is the case, maybe an example would be nice? On 02/22/2013 11:15 AM, Peter Prettenhofer wrote: > I just opened a PR for this issue: > https://gi

Re: [Scikit-learn-general] OPTICS implementation for scikit-learn

2013-02-22 Thread Gael Varoquaux
Hi Fredrik, Given that OPTICS is a fairly standard clustering algorithm that can be made efficient on large datasets, I do believe that it would be interesting to have an implementation. Of course, the usual caveat apply: we need high-quality, efficient, tested and well-documented code. It will ta

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Peter Prettenhofer
I just opened a PR for this issue: https://github.com/scikit-learn/scikit-learn/pull/1702 2013/2/22 Peter Prettenhofer : > @ark: for 500K features and 3K classes your coef_ matrix will be: > 50 * 3000 * 8 / 1024. / 1024. ~= 11GB > > Coef_ is stored as a dense matrix - you might get a considera

[Scikit-learn-general] OPTICS implementation for scikit-learn

2013-02-22 Thread Fredrik Appelros
Is there any interest in an implementation of OPTICS [1] for sklearn.cluster? As part of our thesis work we've extended the cluster package to include the OPTICS algorithm which returns an ordering and reachability distances for the input samples. We're also planning on extracting actual clusters

Re: [Scikit-learn-general] Packaging large objects

2013-02-22 Thread Peter Prettenhofer
@ark: for 500K features and 3K classes your coef_ matrix will be: 50 * 3000 * 8 / 1024. / 1024. ~= 11GB Coef_ is stored as a dense matrix - you might get a considerable smaller matrix if you use sparse regularization (keeps most coefficients zero) and convert the coef_ array to a scipy sparse