2011/10/28 Mathieu Blondel <[email protected]>:
> On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel
> <[email protected]> wrote:
>
>> This is a lot of complex boilerplate for the newcomer.
>
> Plus, that would be a waste of memory and cpu time as the grid search
> would re-split the data just after.
>
> Lately I've been working on large-scale algorithms where it would be
> very useful if I had a validation set directly in fit:
>
> fit(X, y, X_val=None, y_val=None)
>
> fit(X, y, percent_val=0)

percent_val would be a constructor param in that case at it's not data
dependent.

> For example, SGDClassifier could use it for early stopping (don't
> choose the last weight vector but the best one against the validation
> set) or for efficient tuning of the regularization hyperparameter.

I am +1 for X_val=None, y_val=None in fit for the GridSearchCV class
at least. However I am not sure I would make it a general API
recommendation for the rest of the estimators of the scikit such as
for the SGD estimators though: they can build their own validation set
internally as I don't see the point to expose that implementation
detail (early stopping) to the user.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to