On Fri, Oct 28, 2011 at 11:45:42PM +0900, Mathieu Blondel wrote:
> Plus, that would be a waste of memory and cpu time as the grid search
> would re-split the data just after.

I agree with the memory, but the CPU should be really negligible.

> Lately I've been working on large-scale algorithms where it would be
> very useful if I had a validation set directly in fit:

> fit(X, y, X_val=None, y_val=None)

> or

> fit(X, y, percent_val=0)

> For example, SGDClassifier could use it for early stopping (don't
> choose the last weight vector but the best one against the validation
> set) or for efficient tuning of the regularization hyperparameter.

This seems like a very specific API that we would support in a small
fraction of the models. I believe that it would lead more confusion and
feature requests. My point of view on such use case is that I'd like the
scikit's code to be easy to adapt (e.g. via subclassing) but I would
rather not cater for all the possibilities. Yes, it makes boilerplate
code but putting too much of this code in the scikit makes it harder to
maintain.

Gaƫl

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to