> It seems to be an interesting tool to me. We need to find a
> non-trivial overfitting example that would run in an acceptable time
> with the datasets available in the scikit.

Actually, those curves can be plot with respect to any parameter, not
only the training set size.

What comes to me is to use a decision tree and to plot the training
and test curves with respect to max_depth or min_split (this is
actually what I make my students do ;)). With min_split=1 for
instance, you will get a fully developed tree with a perfect score on
the training set (because of overfitting) but a quite bad accuracy on
the test set. As you will increase min_split, the error on the test
set will decrease (because the tree will no longer fit the noise,
i.e., it will become less variant), reach an optimum, and then
increase again (because the tree will become too simpler, i.e., too
biased).

You can do the same with any model (SVM wrt C, linear model wrt to
the regularization factor, etc).

Gilles

------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to