[R-sig-Geo] Large Prediction Variances with gstat

2013-02-28 Thread Jesse Berman
Hi All,

First time post, so please excuse any omissions/confusion.  I am performing
a series of prediction models using gstat and discovered that prediction
variance of spatially dependent data with OLS models was larger than those
of kriging models.  This is counter-intuitive to the assumption that
treating spatially dependent data as IID will result in artificially
shrunken prediction variances.  Anyhow, to better understand how gstat
treats OLS predictions, I reproduced an OLS prediction with the base package
('predict' and 'predict.lm') and found that while I got identical beta's, I
got substantially higher variances with gstat.

Can anyone shed some light as to why gstat might be giving these larger
prediction variances when performing an OLS model?  (see reproducible
example below)

Regards and thanks for the help,
Jesse   

library(gstat)
data(meuse)
coordinates(meuse) = ~x+y
meuse.ns-as.data.frame(meuse) #non-spatial Meuse data

#Data sets for modeling and prediction; spatial and non-spatial
dat.mod-meuse[1:100,]
dat.pred-meuse[101:155,]
dat.ns.mod-meuse.ns[1:100,]
dat.ns.pred-meuse.ns[101:155,]

#Linear Model Prediction with base package
lm.zn-lm(log(zinc)~x+y+elev, data=dat.mod)
lm.zn.pred-predict(lm.zn, dat.pred, se.fit=TRUE)
pred.variance-(lm.zn.pred$se.fit)^2

#Linear Model Prediction with gstat
gstat.lm.zn.pred-krige(log(zinc)~x+y+elev, dat.mod, newdata=dat.pred)

#Compare Results
summary(gstat.lm.zn.pred@data) #gstat results
summary(lm.zn.pred$fit)
summary(pred.variance) #base model prediction variance




--
View this message in context: 
http://r-sig-geo.2731867.n2.nabble.com/Large-Prediction-Variances-with-gstat-tp7582882.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


Re: [R-sig-Geo] Large Prediction Variances with gstat

2013-02-28 Thread Edzer Pebesma
predict.lm gives prediction errors for the mean, where gstat gives 
prediction errors for single observations, identical to


 summary(pred.variance) + lm.zn.pred$residual.scale^2
Min.  1st Qu.   Median Mean  3rd Qu. Max.
0.224297 0.231328 0.233698 0.236858 0.240198 0.301258

If you ask gstat to make predictions for blocks, using linear 
regression, it predicts the mean instead of individual observations, 
which is equivalent to kriging with a pure nugget effect:


 gstat.lm.zn.pred-krige(log(zinc)~x+y+elev, dat.mod, 
newdata=dat.pred, block=1)

[ordinary or weighted least squares prediction]
 summary(gstat.lm.zn.pred@data) #gstat results
   var1.predvar1.var
 Min.   :4.061   Min.   :0.006409
 1st Qu.:4.974   1st Qu.:0.013438
 Median :5.335   Median :0.015809
 Mean   :5.303   Mean   :0.018970
 3rd Qu.:5.618   3rd Qu.:0.022309
 Max.   :6.262   Max.   :0.083373


On 02/28/2013 09:20 PM, Jesse Berman wrote:

Hi All,

First time post, so please excuse any omissions/confusion.  I am performing
a series of prediction models using gstat and discovered that prediction
variance of spatially dependent data with OLS models was larger than those
of kriging models.  This is counter-intuitive to the assumption that
treating spatially dependent data as IID will result in artificially
shrunken prediction variances.  Anyhow, to better understand how gstat
treats OLS predictions, I reproduced an OLS prediction with the base package
('predict' and 'predict.lm') and found that while I got identical beta's, I
got substantially higher variances with gstat.

Can anyone shed some light as to why gstat might be giving these larger
prediction variances when performing an OLS model?  (see reproducible
example below)

Regards and thanks for the help,
Jesse

library(gstat)
data(meuse)
coordinates(meuse) = ~x+y
meuse.ns-as.data.frame(meuse) #non-spatial Meuse data

#Data sets for modeling and prediction; spatial and non-spatial
dat.mod-meuse[1:100,]
dat.pred-meuse[101:155,]
dat.ns.mod-meuse.ns[1:100,]
dat.ns.pred-meuse.ns[101:155,]

#Linear Model Prediction with base package
lm.zn-lm(log(zinc)~x+y+elev, data=dat.mod)
lm.zn.pred-predict(lm.zn, dat.pred, se.fit=TRUE)
pred.variance-(lm.zn.pred$se.fit)^2

#Linear Model Prediction with gstat
gstat.lm.zn.pred-krige(log(zinc)~x+y+elev, dat.mod, newdata=dat.pred)

#Compare Results
summary(gstat.lm.zn.pred@data) #gstat results
summary(lm.zn.pred$fit)
summary(pred.variance) #base model prediction variance




--
View this message in context: 
http://r-sig-geo.2731867.n2.nabble.com/Large-Prediction-Variances-with-gstat-tp7582882.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo



--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de
http://www.52north.org/geostatistics  e.pebe...@wwu.de

___
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo