Hey all,

First of all, I want to thank you for this awesome project.

I am working on a project where I want to fit a linear regression to make some 
predictions. The dataset was split into training/test (70/30). I then applied 
10-fold CV on the training set and made predictions on the test set. It is not 
a particular complex problem so I would expect the estimated RMSE and R2 from 
10-fold CV and test set to be reasonably close with each other. 

It turns out that the estimated RMSE are quite close: "CV 0.7435" versus “test 
set 0.7429”. However, I found the two R2 scores are as follows: “CV -3.0168” 
versus “test set 0.8718”. I can live with the negative R2, but I am confused by 
this inconsistency. I wonder if anyone can help. Thank you in advance.

=================Here is my script=================

from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score

lm = LinearRegression()
train_scores_mse = cross_val_score(lm, trainX_trans_filtered, trainY, cv=10, 
                                            scoring = 'mean_squared_error')
train_scores_rmse = np.sqrt(-1.0 * train_scores_mse)
train_scores_r2 = cross_val_score(lm, trainX_trans_filtered, trainY, cv=10, 
                                            scoring = 'r2')
print "CV estimated RMSE: {0} \nCV estimated R2: 
{1}".format(np.mean(train_scores_rmse), np.mean(train_scores_r2))
CV estimated RMSE: 0.743556872074 
CV estimated R2: -3.01685516116

# apply to the test set
lm.fit(trainX_trans_filtered, trainY)
testY_pred = lm.predict(testX_trans_filtered)from sklearn.metrics import 
r2_score, mean_squared_error
test_score_r2 = r2_score(testY, testY_pred)
test_score_rmse = np.sqrt(mean_squared_error(testY, testY_pred))
print "Test set RMSE: {0} \nTest set R2: {1}".format(test_score_rmse, 
test_score_r2)
Test set RMSE: 0.742917835704 
Test set R2: 0.871834926473

Cheers,
Lei

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to