Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Gael Varoquaux
> > In practice, it is not recommended to use coordinate descent with a very > > small regularization. > Isn't gradient boosting a form of coordinate descent? OK, I should state that above, when I mentionned coordinate descent, I was thinking of the vanilla coordinate descent as done in GLMnet. T

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Joseph Turian
>> (Also, I believe that GB in sklearn is unregularized in its current >> implementation?) >> > > It doesn't have a regularization term but the learning rate parameter can be > used to avoid taking overly big steps: > http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regu

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Mathieu Blondel
On Thu, Sep 27, 2012 at 2:57 PM, Joseph Turian wrote: > Isn't gradient boosting a form of coordinate descent? > It's coordinate descent with greedy selection of the coordinates and early stopping when n_estimators is reached. > > (Also, I believe that GB in sklearn is unregularized in its curre

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Joseph Turian
> In practice, it is not recommended to use coordinate descent with a very > small regularization. Isn't gradient boosting a form of coordinate descent? (Also, I believe that GB in sklearn is unregularized in its current implementation?) Best, Joseph --

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Gael Varoquaux
Indeed, thanks! Gael On Wed, Sep 26, 2012 at 09:57:33PM -0700, Joseph Turian wrote: > Well stated. > On Wed, Sep 26, 2012 at 2:47 PM, James Bergstra > wrote: > > Hi Chih-Jen Lin (as well as the scikit-learn mailing list) > > I've pushed a small change to libsvm today to sklearn > > (https://g

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Gael Varoquaux
On Wed, Sep 26, 2012 at 09:53:36PM -0700, Ariel Rokem wrote: > I haven't tried this yet - I'll try it tomorrow. In a way it sounds like it's > inadvertently implementing an early stopping criterion, Yes: maxiter and tol > which is also a form of regularization. That's confusing, considering > tha

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Joseph Turian
Well stated. On Wed, Sep 26, 2012 at 2:47 PM, James Bergstra wrote: > Hi Chih-Jen Lin (as well as the scikit-learn mailing list) > > I've pushed a small change to libsvm today to sklearn > (https://github.com/scikit-learn/scikit-learn/pull/1184) where a copy > of the libsvm source is mirrored in

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Ariel Rokem
Hey Gael and Alex, Thanks for getting back to me: On Wed, Sep 26, 2012 at 12:42 AM, Alexandre Gramfort < [email protected]> wrote: > hi ariel, > > indeed coordinate descent (an all interative solvers I know) will > converge slowly for low regularization. So just increase max_iter and >

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
> I'm basically looking to take pre-trained classifiers and allows you > to combine the predicted probabilities in custom ways, like favoring > some classifiers over others, etc. > > Not that RandomForests™ are not useful--they could be the building > block classifiers in such a system. > > @Oliver

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Andreas Mueller
Much appreciated James :) -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;1350303

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread James Bergstra
Hi Chih-Jen Lin (as well as the scikit-learn mailing list) I've pushed a small change to libsvm today to sklearn (https://github.com/scikit-learn/scikit-learn/pull/1184) where a copy of the libsvm source is mirrored in sklearn's git project. We were wondering how to proceed. We do not want to di

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Doug Coleman
I put up a copy of the libsvm-3.12 release on my github. For some reason, ``make lib`` in the main directory or ``make`` in python/ doesn't work out of the box, so I made a patch that works on my system. https://github.com/erg/libsvm This is not a hostile fork, just a way to get some version cont

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Gael Varoquaux
Hey Joseph, Fair enough with regards to your points about a fork being considered as aggressive. Thanks a lot raising this point. I guess that I was more thinking of fork in terms of version control rather than in terms of creating a parallel project. I have grown used to fork being useful things

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Joseph Turian
>> If sklearn will be maintaining a patch set against libsvm, this patch set >> should be available to non sklearn users too. > > I reckon you are volonteering to maintain a fork of libsvm? That's very > good news, the community definitely needs this badly. I was considering the idea of a fork, b

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Gael Varoquaux
On Wed, Sep 26, 2012 at 03:53:17PM -0400, Frédéric Bastien wrote: > I would still suggest trying to get it upstream in case it work this time :) +1. I guess the policy should be to try to get it upstream, and if it fails, merge it in sklearn. Thanks a lot, James! Gaël --

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Gael Varoquaux
> If sklearn will be maintaining a patch set against libsvm, this patch set > should be available to non sklearn users too. I reckon you are volonteering to maintain a fork of libsvm? That's very good news, the community definitely needs this badly. Gael ;o PS: this little pique was only t

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Frédéric Bastien
On Wed, Sep 26, 2012 at 3:49 PM, Andreas Mueller wrote: > Hi James. > Thanks for the PR. > I thinks so far we avoided changing LibSVM and tried to get patches > in upstream. Afaik, this hasn't succeeded so far. > The cases I am thinking of is me trying to get the chi2 kernel in and > Lars cleaning

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too. Von meinem iPhone gesendet On Sep 26, 2012, at 12:49 PM, Andreas Mueller wrote: > Hi James. > Thanks for the PR. > I thinks so far we avoided changing LibSVM and tried to get

Re: [Scikit-learn-general] libsvm PR

2012-09-26 Thread Andreas Mueller
Hi James. Thanks for the PR. I thinks so far we avoided changing LibSVM and tried to get patches in upstream. Afaik, this hasn't succeeded so far. The cases I am thinking of is me trying to get the chi2 kernel in and Lars cleaning up some of the code. As LibSVM seems to be very conservative wrt. f

[Scikit-learn-general] libsvm PR

2012-09-26 Thread James Bergstra
Hi list, I submitted a libsvm-related PR on github to add a new parameter. It addresses an infinite loop in libsvm's solver, but in doing so, it required a non-trivial patch of the libsvm source code, in addition to the cython bindings and the classes in the svm submodule. Are changes to libsvm we

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Andreas Mueller
On 09/25/2012 11:19 PM, Olivier Grisel wrote: > I think we could have `classes=None` constructor parameter in > SGDClassifier an possibly many other classifiers. When provided we > would not use the traditional `self.classes_ = np.unique(y)` idiom > already implemented in some classifiers of the pr

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Doug Coleman
@Gilles, Thanks for the link. Those classes basically implement a paper that has a specific idea of RandomForests™ (no kidding, it's trademarked), with bootstrapping, oob estimation, and n trees trained on the same data. I'm basically looking to take pre-trained classifiers and allows you to comb

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
@Doug: Sorry I was typing my previous response from my phone. The snippet of code that I was talking about can be found at: https://github.com/glouppe/scikit-learn/blob/master/sklearn/ensemble/forest.py#L93 Cheers, Gilles On Wednesday, 26 September 2012, Gilles Louppe wrote: > Hi, > > The ense

Re: [Scikit-learn-general] Classifying where some labels are not in dataset

2012-09-26 Thread Gilles Louppe
Hi, The ensemble classes handle the problem you describe already. Have a look at the implementation of predict_proba of BaseForestClassifier in ensemble.py if you want to do that yourself by hand. Hope this helps. Gilles On Wednesday, 26 September 2012, Mathieu Blondel wrote: > > > On Wed, Sep

Re: [Scikit-learn-general] Still trying to understand ElasticNet

2012-09-26 Thread Alexandre Gramfort
hi ariel, indeed coordinate descent (an all interative solvers I know) will converge slowly for low regularization. So just increase max_iter and set tol to 1e-15 Best, Alex On Wed, Sep 26, 2012 at 7:36 AM, Gael Varoquaux wrote: > Hi Ariel, > > On Tue, Sep 25, 2012 at 05:44:21PM -0700, Ariel Ro

Re: [Scikit-learn-general] threading error when training a RFC on a big dataset

2012-09-26 Thread Joseph Turian
My mistake, I meant Jimmy Lin: MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! http://arxiv.org/abs/1209.2191 On Tue, Sep 25, 2012 at 2:28 AM, Olivier Grisel wrote: > 2012/9/24 Joseph Turian : >> Chris Lin iirc has advocated partitioning the examp