Dear Jason, Edzer and Caspar,

Thank you very much for your suggestions. I really feel better because now (at least) I know that I didn't make mistake about variables and implementation of methods. I will try to make adjustment of degrees of freedom and using mean of the training dataset for calculating errors and r-square. As Edzer said, if the test dataset has an outlier (higher precipitation than mean; like 2200 mm) then I obtained the r-square negative!

Best regards and good luck with your studies,
Pinar




Alinti Jason Gasper <jason.gas...@noaa.gov>

I may be a little off base here, but wouldn't an ex-sample R^2 calculation be required since you are using your test data as a "prediction". So the ex-sample R^2 would be 1-(SSE test data/sum(Y-mean(Ytrain))^2). So R^2 in this context has been motivated as a comparison between two competing models. Thus, a negative R^2 value would indicate your ex-sample (test data) forecasts are worse than a mean value.

caspar hallmann wrote:
This raises a question though whether one should use the mean of the
training data or the mean of the test data in calculating the total
sum of squares. I believe the first is more fair with respect to
answering whether a given model is any better as compared the null
model in predicting the response. When using sst based the mean of the
test data you are essentially comparing your model to a null model
that has been based on different data (which i think isn't fair), and
its probably the reason why the ss.err > sst,  and hence R2<0.

Caspar



On Thu, Sep 9, 2010 at 9:15 PM, Edzer Pebesma
<edzer.pebe...@uni-muenster.de> wrote:

Pinar, Jason,

From the script below it seems no adjustment for degrees of freedom
is being made.

In this case R2 can become negative because you use a different
test and train set. Suppose the test set contains one single
extreme that is not present in the training set. In that case, the
mean of the test values is, in terms of sum of squares, a better
predicter than your regression model that didn't know about this
outlier. Don't forget that the mean of the test set does contain
this outlier. Hence, R2 can easily become negative when evaluated
over a different data set then the regression model was derived from.

On 09/09/2010 06:25 PM, Jason Gasper wrote:

Hello Pinar,

I don't know for sure what your calculation is, but R2 values can range
from -inf to 1 if an adjusted R2 is being used. In other words, one
possibility is that your adjusting for degrees of freedom using some
variation of the following (n-1/n-k)(1-R2) where the adjusted R2 is
equivalent to simple regression when k=1.  So when the estimated R2 less
than or equal to 0 that means the model forecast is inferior to the mean
(really poor fit). Another way of looking at a negative R2 is that the
fit is worse than a horizontal line, so the sum-of-squares from the
model is larger than the sum-of-squares from a horizontal line. Again,
poor fit.

Cheers-Jason


Pinar Aslantas Bostan wrote:

Hi all,

I am working about comparison of kriging and regression methods. I
have one dependent (PREC) and seven independent variables. I created
10 different test and train datasets. I am using train datasets for
building the models and test datasets for calculating error (RMSE) and
r-squares. When I obtained prediction values for grid, then I use
overlay() to get predictions for test dataset. For example:

# regression kriging
# dem is the grid (I want to get predictions for each pixel of dem)
and dem$rk.pred1 contains regression kriging predictions

test1$rk.predicted = dem$rk.pred1[overlay(dem, test1)]

# calculating r-square values based on test values

ss <-(test1$PREC-mean(test1$PREC))*(test1$PREC-mean(test1$PREC))
sst1<-sum(ss)
e <-(test1$PREC-test1$rk.predicted)*(test1$PREC-test1$rk.predicted)
sse.rk<-sum(e)
rk1.r.square<-1-(sse.rk/sst1)

My problem is that, for some datasets the methods can be resulted with
negative r-squares. Here I gave an example about regression kriging
but also same problem may occur for linear regression. I checked the
dependent and independent variables and there is no problem with them.
Are there anyone who knows another function instead of overlay() for
the same purpose? (I thougt that maybe the problem is because of
overlay function) or do you have any idea about reason of negative
r-square values?

Best regards,
Pinar


********************************************************************************

Pinar Aslantas Bostan
Research Assistant
Department of Geodetic and
Geographic Information Technologies (GGIT)
Middle East Technical University
06531 Ankara/TURKEY
aslan...@metu.edu.tr

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
http://www.52north.org/geostatistics      e.pebe...@wwu.de

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo



_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to