Re: [Scikit-learn-general] Plotting training to evaluate the bias / variance regime

Olivier Grisel Fri, 30 Dec 2011 08:38:19 -0800

2011/12/30 Gilles Louppe <[email protected]>:
>> It seems to be an interesting tool to me. We need to find a
>> non-trivial overfitting example that would run in an acceptable time
>> with the datasets available in the scikit.
>
> Actually, those curves can be plot with respect to any parameter, not
> only the training set size.
>
> What comes to me is to use a decision tree and to plot the training
> and test curves with respect to max_depth or min_split (this is
> actually what I make my students do ;)). With min_split=1 for
> instance, you will get a fully developed tree with a perfect score on
> the training set (because of overfitting) but a quite bad accuracy on
> the test set. As you will increase min_split, the error on the test
> set will decrease (because the tree will no longer fit the noise,
> i.e., it will become less variant), reach an optimum, and then
> increase again (because the tree will become too simpler, i.e., too
> biased).
>
> You can do the same with any model (SVM wrt C, linear model wrt to
> the regularization factor, etc).


Yes this is the traditional model selection curve, e.g. for
regularized linear regression:

  
http://scikit-learn.org/dev/auto_examples/linear_model/plot_lasso_model_selection.html

What I find interesting with the training data size curves it that it
gives a hint on whether adding more labeled data will help or not.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Plotting training to evaluate the bias / variance regime

Reply via email to