Dear Thierry, Many thanks for your help.
> Have a look at the normalised residuals when creating a variogram. Those > take the correlation structure into account. Using the normalized residuals for the variogram produces more sensible looking results (see bottom image here: http://sites.google.com/site/ldemisc/variogram): # OLS and GLS gls.2<-gls(log(CTOP)~PCB1+PCB2+PCB3+PCB4+PCB5+PCB6+PCB8,data=oc_2) gls.2.sph<-update(gls.2,correlation=corSpher(form=~X+Y,nugget=TRUE)) # OLS variogram v.gls<-variogram(residuals(gls.2)~1,data=oc_2) v.gls.pl<-plot(v.gls) # GLS variogram v.gls.sph<-variogram(residuals(gls.2.sph,type="normalized")~1,data=oc_2) v.gls.sph.pl<-plot(v.gls.sph) # Side by side variogram plot v.gls.pl$x.limits<-v.gls.sph.pl$x.limits v.gls.pl$y.limits<-v.gls.sph.pl$y.limits print(v.gls.pl,split=c(1,1,2,1),more=TRUE) print(v.gls.sph.pl,split=c(2,1,2,1),more=FALSE) This looks like much more plausible GLS output, no? > And have a look at the > parameters of the correlation structure. Are they from a similar > magnitude as you would expect from the OLS variogram? If not, rerun the > GLS with starting values for range and sill based on the OLS variogram. > I had some strange results in the past with range parameters being > smaller than the smallest distance between two points. Supplying > reasonable starting values yielded better results. Looking at the variogram parameter found from the OLS residuals, I see a fair bit of difference between those found by GLS for the correlation structure: vgm.ols.norm<-variogram(residuals(gls.2,type="normalized")~1,data=oc_2) ols.vgm.parms1<-vgm(nugget=0.27,model="Sph",psill=0.6,range=200000) ols.fit.sph.var<-fit.variogram(vgm.ols2,model=ols.vgm.parms1) > ols.fit.sph.var model psill range 1 Nug 0.26197480 0.0 2 Sph 0.09186248 220528.5 #Partial output from summary of GLS fit >summary(gls.2.sph) Correlation Structure: Spherical spatial correlation Formula: ~X + Y Parameter estimate(s): range nugget 1509259 0.1357911 So, I will retry the GLS using the fitted variogram values overnight (the GLS with spherical correlation model takes ~ 6 hours to run). I had one more issue that came up related to this process, which is rather alarming--using R on two different installations, with the same code and same data, I am getting different results for the GLS! I was hoping someone could help me figure out what's wrong. Installation 1: MacBook Pro with OS X 10.5.8, with the following R : R version 2.11.1 (2010-05-31) [R.app GUI 1.34 (5589) x86_64-apple-darwin9.8.0] Installation 2: 8 node Linux cluster, with custom version of RHEL5. I built R 2.11.1 from source here and installed locally. As I said, with my dataset, I am getting different results. Everything is the same with the initial portions of the code: Mac R > oc<-read.dbf('~/Desktop/ARC_profs_OC+PCA+PREDs.dbf') oc_2<-as.data.frame(oc[,c(2,5:6,21:27,207:225)]) gls.2<-gls(log(CTOP)~PCB1+PCB2+PCB3+PCB4+PCB5+PCB6+PCB8,data=oc_2) summary(gls.2) Generalized least squares fit by REML Model: log(CTOP) ~ PCB1 + PCB2 + PCB3 + PCB4 + PCB5 + PCB6 + PCB8 Data: oc_2 AIC BIC logLik 6320.516 6375.617 -3151.258 Coefficients: Value Std.Error t-value p-value (Intercept) 1.6608899 0.4280775 3.87988 1e-04 PCB1 -0.0002037 0.0000156 -13.03177 0e+00 PCB2 0.0004070 0.0000230 17.68030 0e+00 PCB3 0.0026349 0.0000460 57.28741 0e+00 PCB4 0.0133557 0.0011803 11.31585 0e+00 PCB5 -0.0085951 0.0012186 -7.05294 0e+00 PCB6 0.0315716 0.0012912 24.45119 0e+00 PCB8 -0.0990330 0.0064676 -15.31206 0e+00 [snipped for space] Residual standard error: 0.6061033 Degrees of freedom: 3377 total; 3369 residual Linux R > Generalized least squares fit by REML Model: log(CTOP) ~ PCB1 + PCB2 + PCB3 + PCB4 + PCB5 + PCB6 + PCB8 Data: oc_2 AIC BIC logLik 6320.516 6375.617 -3151.258 Coefficients: Value Std.Error t-value p-value (Intercept) 1.6608899 0.4280775 3.87988 1e-04 PCB1 -0.0002037 0.0000156 -13.03177 0e+00 PCB2 0.0004070 0.0000230 17.68030 0e+00 PCB3 0.0026349 0.0000460 57.28741 0e+00 PCB4 0.0133557 0.0011803 11.31585 0e+00 PCB5 -0.0085951 0.0012186 -7.05294 0e+00 PCB6 0.0315716 0.0012912 24.45119 0e+00 PCB8 -0.0990330 0.0064676 -15.31206 0e+00 [snipped for space] Residual standard error: 0.6061033 Degrees of freedom: 3377 total; 3369 residual So, this suggests that the input datasets are not somehow different from one another. The problem comes here: Mac R > gls.2.sph<-update(gls.2,correlation=corSpher(form=~X+Y,nugget=TRUE)) summary(gls.2.sph) Generalized least squares fit by REML Model: log(CTOP) ~ PCB1 + PCB2 + PCB3 + PCB4 + PCB5 + PCB6 + PCB8 Data: oc_2 AIC BIC logLik 5742.647 5809.993 -2860.323 Correlation Structure: Spherical spatial correlation Formula: ~X + Y Parameter estimate(s): range nugget 1.509259e+06 1.357911e-01 Coefficients: Value Std.Error t-value p-value (Intercept) -1.6911740 0.9610770 -1.759665 0.0786 PCB1 -0.0001095 0.0000166 -6.612440 0.0000 PCB2 0.0012919 0.0000677 19.081299 0.0000 PCB3 0.0014827 0.0001903 7.791159 0.0000 PCB4 0.0026119 0.0014134 1.847974 0.0647 PCB5 -0.0119822 0.0012409 -9.655755 0.0000 PCB6 0.0196123 0.0017884 10.966588 0.0000 PCB8 0.0290101 0.0193420 1.499847 0.1337 Correlation: (Intr) PCB1 PCB2 PCB3 PCB4 PCB5 PCB6 PCB1 -0.496 PCB2 -0.202 0.229 PCB3 0.260 -0.427 -0.161 PCB4 0.254 -0.439 -0.403 0.280 PCB5 -0.007 -0.256 -0.123 0.023 0.455 PCB6 0.326 -0.405 -0.246 0.428 0.487 0.161 PCB8 -0.618 0.473 0.339 -0.583 -0.535 -0.091 -0.622 Standardized residuals: Min Q1 Med Q3 Max -2.8618482 -1.1854998 -0.7948018 -0.3657106 1.3957297 Residual standard error: 1.368062 Degrees of freedom: 3377 total; 3369 residual Linux R > gls.2.sph<-update(gls.2,correlation=corSpher(form=~X+Y,nugget=TRUE)) summary(gls.2.sph) Generalized least squares fit by REML Model: log(CTOP) ~ PCB1 + PCB2 + PCB3 + PCB4 + PCB5 + PCB6 + PCB8 Data: oc_2 AIC BIC logLik 5742.867 5810.213 -2860.433 Correlation Structure: Spherical spatial correlation Formula: ~X + Y Parameter estimate(s): range nugget 1.625556e+06 1.272891e-01 Coefficients: Value Std.Error t-value p-value (Intercept) -1.5625757 0.9952075 -1.570100 0.1165 PCB1 -0.0001095 0.0000166 -6.609627 0.0000 PCB2 0.0012916 0.0000677 19.078453 0.0000 PCB3 0.0014815 0.0001904 7.779445 0.0000 PCB4 0.0026110 0.0014141 1.846390 0.0649 PCB5 -0.0119806 0.0012410 -9.654330 0.0000 PCB6 0.0196109 0.0017893 10.959835 0.0000 PCB8 0.0291060 0.0193667 1.502889 0.1330 Correlation: (Intr) PCB1 PCB2 PCB3 PCB4 PCB5 PCB6 PCB1 -0.477 PCB2 -0.189 0.229 PCB3 0.247 -0.428 -0.162 PCB4 0.242 -0.439 -0.404 0.281 PCB5 -0.008 -0.256 -0.123 0.023 0.455 PCB6 0.312 -0.405 -0.246 0.429 0.487 0.161 PCB8 -0.592 0.474 0.340 -0.583 -0.535 -0.091 -0.623 Standardized residuals: Min Q1 Med Q3 Max -2.8627169 -1.2401476 -0.8617219 -0.4464233 1.2590098 Residual standard error: 1.413038 Degrees of freedom: 3377 total; 3369 residual Just about everything is different in this, even if just slightly, and I am not sure why. To check to see if there was a problem with another dataset, I used meuse: data(meuse) meu.ols<-gls(log(zinc)~dist.m+ffreq,meuse) meu.gls<-update(meu.ols,correlation=corExp(form=~x+y,nugget=TRUE)) summary(meu.gls) But the results are the same from both, e.g. Mac > Parameter estimate(s): range nugget 236.94597158 0.02719958 Coefficients: Value Std.Error t-value p-value (Intercept) 6.759524 0.12988675 52.04168 0 dist.m -0.001910 0.00028717 -6.64969 0 ffreq2 -0.562034 0.06642560 -8.46110 0 ffreq3 -0.557639 0.10166715 -5.48495 0 Linux > Parameter estimate(s): range nugget 236.94597172 0.02719958 Coefficients: Value Std.Error t-value p-value (Intercept) 6.759524 0.12988675 52.04168 0 dist.m -0.001910 0.00028717 -6.64969 0 ffreq2 -0.562034 0.06642560 -8.46110 0 ffreq3 -0.557639 0.10166715 -5.48495 0 So, I am really not sure what is going on here. As I said, my datasets seem fine, and to further confirm this I exported the values from oc_2 from linux, imported into my Mac R installation, and did a few tests: Linux R > write.csv(oc_2[,c(1:3,9,14:21)],oc2out.txt) Mac R > oc_2_ad<-read.csv("oc2out.txt") str(oc_2_ad) str(oc_2) #Compare coordinates xdiffs<-round(oc_2$X-oc_2_ad$X,5) ydiffs<-round(oc_2$Y-oc_2_ad$Y,5) ctopdiffs<-oc_2$CTOP-oc_2_ad$CTOP pc1diffs<-oc_2$PCB1-oc_2_ad$PCB1 pc2diffs<-oc_2$PCB2-oc_2_ad$PCB2 pc3diffs<-oc_2$PCB3-oc_2_ad$PCB3 pc4diffs<-oc_2$PCB4-oc_2_ad$PCB4 pc5diffs<-oc_2$PCB5-oc_2_ad$PCB5 pc6diffs<-oc_2$PCB6-oc_2_ad$PCB6 pc8diffs<-oc_2$PCB8-oc_2_ad$PCB8 min(xdiffs);max(xdiffs);mean(xdiffs) min(ydiffs);max(ydiffs);mean(ydiffs) min(ctopdiffs);max(ctopdiffs);mean(ctopdiffs) min(pc1diffs);max(pc1diffs);mean(pc1diffs) min(pc2diffs);max(pc2diffs);mean(pc2diffs) min(pc3diffs);max(pc3diffs);mean(pc3diffs) min(pc4diffs);max(pc4diffs);mean(pc4diffs) min(pc5diffs);max(pc5diffs);mean(pc5diffs) min(pc6diffs);max(pc6diffs);mean(pc6diffs) min(pc8diffs);max(pc8diffs);mean(pc8diffs) The results were all zeros, so the problem doesn't seem to be different values in the dataset, or misaligned rows. So, I am at a loss here. Could this have something to do with a faulty build on the linux cluster? Thanks in advance for any help and advice! Cheers, Lyndon _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo