On Thu, 21 Sep 2006, Sam Field wrote: > Roger, > > Thanks for the code. Just now had time to get back to the problem. > Sorry it took so long. > > you wrote, > > > to just take the null models in both cases, where lambda == rho > > by > > definition, but the intercept term differs (I'm not sure how to > > insert > > autocorrelation into the intercept in the lag model, which is > > probably a > > source of error in the simulation. > > I am a little confused by this statement. I thought that the only > difference between the lag and error model is that in the error > model, spatial "spill overs" are restricted to the error term. I > was thinking that lambda would always equal rho. Not just in the > null model. I was also thinking that the spatial lag model would > not be properly identified without at least on covariate, since > absent any variation in the systematic component of the model (Xb), > what would the spatial lag model mean? If you premultipy a vector of > constants by (I-pW)^-1, you just get back another vector of > constants.
For row-standardised, yes, but not necessarily in other styles of weighting, so it is a bit picky. The code in the functions tries to accommodate that, following reports from users not using row standardisation. > A couple of other observations. When the sample size in your example is > increased, the lambda and rho estimates are quite a bit closer to their > simluated values. > > I added a single covariate in your example, and the recovery of > labmda and rho preformed about as well as the null models. > > Although your example didn't recover the simulated values exactly. > They were close enough to make me think that there is something > funny about my example. So this is where I now turn. > OK, that seems sensible. Please keep the list informed of where this takes you. Best wishes, Roger > thanks for your input! > > Sam > > > > Quoting Roger Bivand <[EMAIL PROTECTED]>: > > > Sam, > > > > Indeed perplexing. I've tried to slim things down to: > > > > library(spdep) > > nsamp <- 100 > > rho <- 0.4 > > nb7rt <- cell2nb(7, 7, torus=TRUE) > > set.seed(20060918) > > e <- matrix(rnorm(nsamp*length(nb7rt)), nrow=length(nb7rt)) > > listw <- nb2listw(nb7rt) > > invmat <- invIrW(listw, rho=rho) > > y <- invmat %*% e > > lag_res0 <- lapply(1:nsamp, function(i) > > lagsarlm(y[,i] ~ 1, listw=listw)) > > err_res0 <- lapply(1:nsamp, function(i) > > errorsarlm(y[,i] ~ 1, listw=listw)) > > summary(do.call("rbind", lapply(lag_res0, coefficients))) > > summary(do.call("rbind", lapply(err_res0, coefficients))) > > > > to just take the null models in both cases, where lambda == rho > > by > > definition, but the intercept term differs (I'm not sure how to > > insert > > autocorrelation into the intercept in the lag model, which is > > probably a > > source of error in the simulation. > > > > I also had a look at Haining 1990 and the use of the Cholesky > > lower > > triangle of the SAR (simultaneous autoregressive) covariance > > matrix, and > > tried: > > > > W <- listw2mat(listw) > > Id <- diag(length(nb7rt)) > > sarcov <- solve(t(Id - rho*W) %*% (Id - rho*W)) > > sarcovL <- chol((sarcov + t(sarcov))/2) > > y <- t(sarcovL) %*% e # Kaluzny et al. 1996, p. 160 > > lag_res1 <- lapply(1:nsamp, function(i) > > lagsarlm(y[,i] ~ 1, listw=listw)) > > err_res1 <- lapply(1:nsamp, function(i) > > errorsarlm(y[,i] ~ 1, listw=listw)) > > summary(do.call("rbind", lapply(lag_res1, coefficients))) > > summary(do.call("rbind", lapply(err_res1, coefficients))) > > > > which is similar to, but not the same as just premultiplying by > > (I - rho W)^{-1}, and where I'm very unsure how to simulate for > > the lag > > model correctly. The error model simulation is, however, perhaps > > best done > > this way, isn't it? > > > > Running this shows that none of them retrieve the input > > coefficient value > > exactly. The weights here are about as well-behaved as possible, > > all > > observations have four (grid on a torus), and things perhaps get > > worse > > with less regular weights. > > > > Simulating for the error model is discussed in the literature, > > but the lag > > model is something of an oddity, certainly worth sorting out. > > > > Best wishes, > > > > Roger > > > > > > On Sat, 16 Sep 2006, Sam Field wrote: > > > > > Roger and Terry, > > > > > > Thanks to both of you for your reply. Yes, sigma^2 enters into > > the > > > generation of e. The systematic component of the model, > > > X_mat%*%parms, has a variance of around 10^2. e is typically > > > around 4^2, or... > > > > > > e <- rnorm(n,mean=0,sd=4) > > > > > > > > > I went ahead and copied the code below (it isn't too long) - in > > case > > > others are interested in this exersise. The car package is > > only > > > necessary for the n.bins function in the last lines of the > > code. So > > > it is optional. I also tried the invIrM function suggested by > > > Terry. It seems to produce the same results that I get. It > > also > > > appears in the code below. Any help/suggestions would be > > greatly > > > appreciated. I am stuck. > > > > > > > > > > > > #creating data > > > > > > library(spdep) > > > library(car) > > > > > > > > > #sample size > > > n <- 100 > > > > > > > > > > > > > > > > > > #coordinates > > > > > > x_coord <- runif(n,0,10) > > > y_coord <- runif(n,0,10) > > > > > > > > > ## w matrix and row normalized > > > > > > w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1) > > > > > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) > > > {w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 + > > > (y_coord[i]-y_coord[j])^2))}} > > > > > > > > > w_dummy <- matrix(nrow=length(x_coord),ncol=length(x_coord),1) > > > > > > > > > #defines neighborhood distance > > > > > > neighbor_dist <- 3 > > > > > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) > > > {if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}} > > > > > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) > > > {if( i == j ) w_dummy[i,j] <- 0}} > > > > > > > > > row_sum <- rep(1,length(x_coord)) > > > for(i in 1:length(x_coord)) > > > {row_sum[i] <- sum(w_dummy[i,]) } > > > > > > w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1) > > > > > > > > > # divide by row sums > > > for(i in 1:length(x_coord)){for(j in 1:length(x_coord)) > > > {w[i,j] <- w_dummy[i,j]/row_sum[i]}} > > > > > > > > > > > > neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist) > > > nb_list <- nb2listw(neighbors) > > > > > > #create some independent variables > > > X1 <- rnorm(n,4,2) > > > X2 <- rnorm(n,2,2) > > > X3 <- rnorm(n,0,1) > > > X4 <- rnorm(n,13,3) > > > X5 <- rnorm(n,21,.5) > > > X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5) > > > > > > # population parameters > > > parms <- c(12.4,2,-4,5,1,-.5) > > > rho <- .3 > > > > > > > > > > > > #begin iterations > > > n_iter <- 100 #draw 1500 samples of size n > > > > > > estimates <- matrix(nrow=n_iter,ncol=7) > > > estimates2 <- matrix(nrow=n_iter,ncol=7) > > > estimates3 <- matrix(nrow=n_iter,ncol=7) > > > > > > > > > for(iter in 1:n_iter) > > > { > > > > > > > > > > > > e <- rnorm(n,mean=0,sd=4) > > > > > > > > > # drawing samples > > > y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms + > > > solve(diag(100)-rho*w)%*%e > > > y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e > > > y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e > > > > > > > > > lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5, > > > listw=nb_list,type='lag') > > > lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5, > > > listw=nb_list) > > > lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5, > > > listw=nb_list) > > > > > > > > > # retrieve parameter estimates > > > > > > estimates[iter,] <- coefficients(lag_model) > > > estimates2[iter,] <- coefficients(lag_model2) > > > estimates3[iter,] <- coefficients(lag_model3) > > > } > > > > > > > > > > > > mean(estimates[,7]) > > > > > hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE) > > > lines(density(estimates[,7]),lwd=2) > > > > > > mean(estimates2[,7]) > > > > > > hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE) > > > lines(density(estimates2[,7]),lwd=2) > > > > > > mean(estimates3[,7]) > > > > > > hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE) > > > lines(density(estimates3[,7]),lwd=2) > > > > > > > > > #data <- data.frame(cbind(y_lag, y_error, > > > X1,X2,X3,X4,X5,x_coord,y_coord)) > > > #names(data) <- c("y_lag","y_error",names(data)[3:9]) > > > #write.table(data,file="C:/test.txt") > > > > > > > > > > > > > > > > > > > > > > > > > > > Quoting Roger Bivand <[EMAIL PROTECTED]>: > > > > > > > On Fri, 15 Sep 2006, Sam Field wrote: > > > > > > > > > List, > > > > > > > > > > I am having trouble writing a script that samples from an > > SAR > > > > error > > > > > process. I've done it successfully for a spatial lag model, > > but > > > > not > > > > > for the spatial error model. For the spatial lag model, I > > have > > > > the > > > > > following: > > > > > > > > > > y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms + > > > > > solve(diag(100)-p*w)%*%e > > > > > > > > > > where parms is a parameter vector > > > > > X_mat is n by p matrix of independent variables (+ > > constant) > > > > > e is a vector of indepndent normal deviates (mean = 0) > > > > > p is the autoregressive paramter > > > > > and w is a square, n by n contiguity matrix (row > > normalized). > > > > > > > > > > This works beautifully. lagsarlm recovers parms and p > > without > > > > a > > > > > problem. Over repeated sampling, the estimated values are > > > > centered > > > > > on the value for p in the simulation. > > > > > > > > > > > > > > > Is there something wrong with the following for the spatial > > > > error > > > > > model? > > > > > > > > > > y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e > > > > > > > > > > > > > Interesting question. Where is the sigma^2 going in your case > > - > > > > is it in > > > > the generation of e? Could you try to use the columbus > > dataset > > > > and > > > > set.seed to generate a reproducible example - it may be that > > what > > > > you have > > > > written does not communicate exactly what you have done? > > > > > > > > Roger > > > > > > > > > > > > > > The distribution of values for p obtain from errorsarlm > > over > > > > > repeated sampling are not centered around the value for the > > > > > simulation, but are typically much lower and all over the > > > > place. I > > > > > have only looked at values for p ranging from .3 to .7. > > > > > > > > > > any help would be greatly appreciated! > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Roger Bivand > > > > Economic Geography Section, Department of Economics, > > Norwegian > > > > School of > > > > Economics and Business Administration, Helleveien 30, N-5045 > > > > Bergen, > > > > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 > > > > e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > -- > > Roger Bivand > > Economic Geography Section, Department of Economics, Norwegian > > School of > > Economics and Business Administration, Helleveien 30, N-5045 > > Bergen, > > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 > > e-mail: [EMAIL PROTECTED] > > > > > > > -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] _______________________________________________ R-sig-Geo mailing list R-sig-Geo@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo