Thanks I've created a script here with the two datasets (one for training the model and one for testing the model), and a loader file to get the data into python variables.
https://gist.github.com/3309139 I can't seem to get anything out of the data, I'd really appreciate the help figuring out a good regression model. Thanks Zach On 09/08/2012 11:17, Peter Prettenhofer wrote: > Hi Zach, > > if you provide a gist with your evaluation setup (similar to this one > [1]) I can look into it. > > best, > Peter > > [1] https://gist.github.com/3266657 > > 2012/8/9 Zach Bastick <[email protected]>: >> I’m having some conceptual trouble with this supervised machine learning >> project (regression) that hopefully someone can help me with. >> >> I am trying to do sentiment analysis on texts (scoring them from -10 to >> +10) based on a human-scored training set. >> >> Training set: >> Cases = 35 >> Score Mean = 0.77 >> Score STD =8.07 >> >> Testing set: >> Cases = 12 >> Score Mean = -2.08 >> Score STD = 7.43 >> >> Features: >> Number: 8 >> They are: >> Two scores based on word frequency. >> These correlate highly with the real scores. >> The rest are features of the text such as ‘punctuation density’ >> >> Evaluation: >> I calculate the prediction accuracy by finding the mean error >> between the prediction (machine score) and target (real >> human score). >> >> Methods, In order of success: >> Linear Regression (OLS): >> Code: linear_model.LinearRegression() >> Result: >> Training Set Mean Error: 7.51 >> Training Set STDV of Error: 6.58 >> Testing Set Mean Error: 90.29 >> Testing Set STDV of Error: 11.26 >> Support Vector Regression (SVR), Linear: >> Code: SVR(kernel="linear") >> Result: >> Training Set Mean Error: 8.17 >> Training Set STDV of Error: 8.55 >> Testing Set Mean Error: 89.93 >> Testing Set STDV of Error: 11.12 >> Ridge Regression: >> Code: linear_model.Ridge() >> Result: >> Training Set Mean Error: 8.39 >> Training Set STDV of Error: 7.46 >> Testing Set Mean Error: 90.65 >> Testing Set STDV of Error: 11.13 >> Support Vector Regression (SVR), 2nd degree polynomial: >> Code: SVR(kernel="poly", degree=2) >> Result: >> Training Set Mean Error: 9.16 >> Training Set STDV of Error: 7.35 >> Testing Set Mean Error: 107.31 >> Testing Set STDV of Error: 35.19 >> >> But as you can see, the predictions are absolutely terrible, no matter >> what I do. >> The training set predictions are quite accurate though. From my reading, >> this could be due to over fitting. However, I don’t see how simple >> linear model (OLS) could over fit anything… On top of that, the features >> I’m working with lends itself to prediction (the features based on word >> frequencies in particular correlate highly with the real human scores – >> I’ve even tried Neural Networks in SPSS using the default settings and >> the training set prediction works well. But I can’t get anything to work >> well here in SciKit-Learn. >> >> So what’s the problem? >> >> Thanks so much, >> >> Zach >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
