Hey all,
First of all, I want to thank you for this awesome project.
I am working on a project where I want to fit a linear regression to make some
predictions. The dataset was split into training/test (70/30). I then applied
10-fold CV on the training set and made predictions on the test set. It is not
a particular complex problem so I would expect the estimated RMSE and R2 from
10-fold CV and test set to be reasonably close with each other.
It turns out that the estimated RMSE are quite close: "CV 0.7435" versus “test
set 0.7429”. However, I found the two R2 scores are as follows: “CV -3.0168”
versus “test set 0.8718”. I can live with the negative R2, but I am confused by
this inconsistency. I wonder if anyone can help. Thank you in advance.
=================Here is my script=================
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score
lm = LinearRegression()
train_scores_mse = cross_val_score(lm, trainX_trans_filtered, trainY, cv=10,
scoring = 'mean_squared_error')
train_scores_rmse = np.sqrt(-1.0 * train_scores_mse)
train_scores_r2 = cross_val_score(lm, trainX_trans_filtered, trainY, cv=10,
scoring = 'r2')
print "CV estimated RMSE: {0} \nCV estimated R2:
{1}".format(np.mean(train_scores_rmse), np.mean(train_scores_r2))
CV estimated RMSE: 0.743556872074
CV estimated R2: -3.01685516116
# apply to the test set
lm.fit(trainX_trans_filtered, trainY)
testY_pred = lm.predict(testX_trans_filtered)from sklearn.metrics import
r2_score, mean_squared_error
test_score_r2 = r2_score(testY, testY_pred)
test_score_rmse = np.sqrt(mean_squared_error(testY, testY_pred))
print "Test set RMSE: {0} \nTest set R2: {1}".format(test_score_rmse,
test_score_r2)
Test set RMSE: 0.742917835704
Test set R2: 0.871834926473
Cheers,
Lei
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general