2011/10/28 Mathieu Blondel <[email protected]>: > On Fri, Oct 28, 2011 at 11:27 PM, Olivier Grisel > <[email protected]> wrote: > >> This is a lot of complex boilerplate for the newcomer. > > Plus, that would be a waste of memory and cpu time as the grid search > would re-split the data just after. > > Lately I've been working on large-scale algorithms where it would be > very useful if I had a validation set directly in fit: > > fit(X, y, X_val=None, y_val=None) > > fit(X, y, percent_val=0)
percent_val would be a constructor param in that case at it's not data dependent. > For example, SGDClassifier could use it for early stopping (don't > choose the last weight vector but the best one against the validation > set) or for efficient tuning of the regularization hyperparameter. I am +1 for X_val=None, y_val=None in fit for the GridSearchCV class at least. However I am not sure I would make it a general API recommendation for the rest of the estimators of the scikit such as for the SGD estimators though: they can build their own validation set internally as I don't see the point to expose that implementation detail (early stopping) to the user. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
