[Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Andreas Mueller
Hi everybody and merry Christmas. I wanted to ask what people think about the future of the generalized cross-validation API. Currently, estimators that make some generalized cross-validation possible provide a EstimatorCV class, (RidgeCV, RFECV, LassoLarsCV). I think we should decide whether

Re: [Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Gilles Louppe
Hi Andreas! ... and Merry Christmas to all! Quick and naive question: what is the point in cross-validating the number of trees in RandomForest (or in Extra-Trees)? The rule simple is simple: the more, the better. Gilles On 25 December 2012 13:07, Andreas Mueller amuel...@ais.uni-bonn.de

Re: [Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Andreas Mueller
On 12/25/2012 01:24 PM, Gilles Louppe wrote: Hi Andreas! ... and Merry Christmas to all! Quick and naive question: what is the point in cross-validating the number of trees in RandomForest (or in Extra-Trees)? The rule simple is simple: the more, the better. Ok, maybe RandomForest was a bad

Re: [Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Gilles Louppe
Second, what do you exactly mean by generalized CV? I am not sure to have the same idea in mind. Do you mean finding the best parameter value without brute force, in a smart way specific to the estimator? In that case, one could do that on min_samples_split, using a post pruning procedure.

Re: [Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Andreas Mueller
On 12/25/2012 01:40 PM, Gilles Louppe wrote: Second, what do you exactly mean by generalized CV? I am not sure to have the same idea in mind. Do you mean finding the best parameter value without brute force, in a smart way specific to the estimator? Basically yes. Something that fits an

Re: [Scikit-learn-general] Generalized Cross-Validation API

2012-12-25 Thread Alexandre Gramfort
hi, the CV models in coordinate_descent have the same use case. We use warm restarts to fit efficiently for many values of alpha. The way it is done is via a path function that returns a list of models fitted sequentially. Then there is cv loop that runs the path for every fold and picks the best