On Thu, Aug 9, 2012 at 2:08 PM, Zach Bastick <[email protected]> wrote:
> But as you can see, the predictions are absolutely terrible, no matter > what I do. > The training set predictions are quite accurate though. From my reading, > this could be due to over fitting. However, I don’t see how simple > linear model (OLS) could over fit anything… Quite easily. Overfitting is not really a property of the model family so much as how robustly you can estimate parameters given an amount of training data. Remember that the coefficients you are fitting specify a hyperplane in high dimensional space (where your geometric intuitions will often fail you), and unless your problem is truly linear and also noiseless, the OLS coefficients estimated on a finite training set are but an approximation of the "best" hyperplane for your problem (in terms of minimizing generalization error). With 8 features and only 35 cases, you're estimating 9 parameters from very little data, and you will most likely need heavy regularization to avoid the model overspecializing to the quirks of the training set. I would try out ridge regression as a first pass, and maybe the Lasso. Cheers, David ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
