Re: [Scikit-learn-general] No methods seem to predict well

David Warde-Farley Thu, 09 Aug 2012 18:21:20 -0700

On Thu, Aug 9, 2012 at 2:08 PM, Zach Bastick <[email protected]> wrote:


> But as you can see, the predictions are absolutely terrible, no matter
> what I do.
> The training set predictions are quite accurate though. From my reading,
> this could be due to over fitting. However, I don’t see how simple
> linear model (OLS) could over fit anything…

Quite easily. Overfitting is not really a property of the model family
so much as how robustly you can estimate parameters given an amount of
training data. Remember that the coefficients you are fitting specify a
hyperplane in high dimensional space (where your geometric intuitions
will often fail you), and unless your problem is truly linear and also
noiseless, the OLS coefficients estimated on a finite training set are but
an approximation of the "best" hyperplane for your problem (in terms of
minimizing generalization error).

With 8 features and only 35 cases, you're estimating 9
parameters from very little data, and you will most likely need heavy
regularization to avoid the model overspecializing to the quirks of
the training set. I would try out ridge regression as a first pass,
and maybe the Lasso.

Cheers,

David

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] No methods seem to predict well

Reply via email to