Re: [Scikit-learn-general] High training error for small datasets

Olivier Grisel Fri, 22 Jun 2012 03:16:28 -0700

2012/6/22 Kai Kuehne <[email protected]>:
> Hi,
> I posted this question a few days ago on IRC shortly before my
> internet connection broke down,
> so sorry if you read this already.
>
> I'm currently building a simple classification system and try to use
> learning curves to check whether whether
> my model suffers from high bias or high variance.
> I (think I) followed the instructions on this page:
> http://jakevdp.github.com/tutorial/astronomy/practical.html
> So, if i understood this correctly, the training error should be small
> for small training sets.
> But, in my implementation and for my corpus, the training error starts
> high: http://i.imgur.com/j4MNx.png
> I calculate the error for every m like this: http://dpaste.com/761794/


Maybe the machine learning algorithm stops before reaching actual
convergence? What kind of data are you using? what dimensions? what
type of model and what parameters are you using?

Here is an alternative implementation of the learning curves:

  https://gist.github.com/1540431

They behave as expected in this case.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] High training error for small datasets

Reply via email to