I have tried various machine learning algorithms from scikit learn but 
can't find a good prediction model.
The features I'm using are the tf-idf of set of text documents, 
correlated with human ratings assigned to each document. I'm thinking 
that I must be doing something wrong as the scores can't be that bad 
(not to mention negative?)

If someone could have a look at it, I'd really appreciate it. I didn't 
upload to a github gist because they won't let me upload the dataset 
directory. So I've uploaded my really short code (regression.py) AND the 
original data set (/texts) here (625K):
https://dl.dropbox.com/u/74279156/regression.zip

This is my output:
C:\python code\program>python regression.py
loading texts...
n_samples: 53, n_features: 6284

LinearRegresson
[ 0.34662496  0.23446674  0.30332109  0.3163838   0.01607913]
Accuracy: 0.24 (+/- 0.06)

SVR linear
[-0.05521329 -1.61280714 -0.67428098 -0.8805647  -2.20730703]
Accuracy: -1.09 (+/- 0.37)

SVR poly 4 degrees
[-0.18814233 -1.78480475 -0.88158686 -1.05944432 -2.40284073]
Accuracy: -1.26 (+/- 0.38)

SVR sigmoid
[-0.18814233 -1.78480475 -0.88158686 -1.05944432 -2.40284073]
Accuracy: -1.26 (+/- 0.38)


Please tell me what's wrong.. I'm dying to know how to get scikit-lean 
to predict based on this dataset.

Thanks

Zach

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to