2011/10/28 Andreas Mueller <[email protected]>:
> Hi everybody.
> This is about the grid_search and cross_validation modules.
> Often, in particular when the dataset is large or the algorithm slow,
> it is not feasible to do n-fold cross validation and people use
> a single training/validation split to find hyperparameters.
>
> As far as I can see, this is not supported in sklearn.
> Do you think it should be included as an option to
> do grid searches? It is not really "cross" validation
> but I think the cross_validation module would be
> the right place for that.
>
> What do you think?
>
> Cheers,
> Andy

Check the bootstrap CV generator that gives fine control over the size
of both the train and test (validation) sub-samples and the number of
"folds" you want and hence run your grid_search faster if you aren't
afraid about too much variance in your estimates.

  
http://scikit-learn.org/dev/modules/cross_validation.html#bootstrapping-cross-validation

We could also extend the shuffle split to give similar flexibility:

  
http://scikit-learn.org/dev/modules/cross_validation.html#random-permutations-cross-validation-a-k-a-shuffle-split

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to