On Fri, Oct 28, 2011 at 11:45:42PM +0900, Mathieu Blondel wrote: > Plus, that would be a waste of memory and cpu time as the grid search > would re-split the data just after.
I agree with the memory, but the CPU should be really negligible. > Lately I've been working on large-scale algorithms where it would be > very useful if I had a validation set directly in fit: > fit(X, y, X_val=None, y_val=None) > or > fit(X, y, percent_val=0) > For example, SGDClassifier could use it for early stopping (don't > choose the last weight vector but the best one against the validation > set) or for efficient tuning of the regularization hyperparameter. This seems like a very specific API that we would support in a small fraction of the models. I believe that it would lead more confusion and feature requests. My point of view on such use case is that I'd like the scikit's code to be easy to adapt (e.g. via subclassing) but I would rather not cater for all the possibilities. Yes, it makes boilerplate code but putting too much of this code in the scikit makes it harder to maintain. Gaƫl ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
