> How do you go about deciding how many matches you will use? With my > data, my standard errors generally get smaller if I use more > matches.
Generally, select the max number of matches that result in good or acceptable balance (hence bounding bias due to the observed confounders). See the MatchBalance() function to get some measures of balance. And GenMatch() for automatically maximizing (observed) covariate balance. How to measure good balance is an open research question. I will note that the degree of covariate balance that is usually thought to be acceptable in the applied literature isn't enough to get reliable estimates in practice. We can evaluate this by comparing an observational estimate (with matching adjustment) with a known experimental benchmark. See: http://sekhon.berkeley.edu/papers/GenMatch.pdf > Speaking of standard errors, when correcting for heteroscedasticity, > how many matches do you use (this is the Var.cal option). It seems to > me that it might make sense to use the same number of matches as > above, but that's just a guess... These are related but separate issues. The number of matches is all about covariate balance (bias reduction). And the Var.cal option is related to the heterogeneity of the causal effect. It could be that the data is such that one needs to do 1-to-1 matching to get good covariate balance, but that the causal effect is homogeneous so Var.cal can be set to 0 etc. > One more question about Match()... > I am calculating a number of SATT's that all have the same covariates > (X's) and treatment variables (Tr's). I would like to take advantage > of the matching that I do the first time to then quickly calculate the > SATT for various different Y's? How can I do that? It would save > serious computational time. The following code expands on your code and will estimate the mean causal effect and the naive standard errors without a second call to Match(). Doing this for the Abadie-Imbens SEs instead of the naive SEs is left as an exercise (take the code from the Matching.R file of the package). In a future version of the package, I'll make a separate function to make all of this transparent by using the "predict()" setup. ################### library(Matching) set.seed(30) #make up some data X <- matrix(rnorm(1000*5), ncol=5) Tr <- c(rep(1,500),rep(0,500)) Y1 <- as.vector(rnorm(1000)) Y2 <- as.vector(rnorm(1000)) satt.Y1 <- Match(Y=Y1, X=X, Tr=Tr, M=1) summary(satt.Y1, full=TRUE) cat("****** Estimate Y2 BY Calling Match() \n") satt.Y2 <- Match(Y=Y2, X=X, Tr=Tr, M=1) summary(satt.Y2, full=TRUE) cat("****** Estimate Without Calling Match() \n") index.treated <- satt.Y1$index.treated index.control <- satt.Y1$index.control weights <- satt.Y1$weights Y <- Y2 mest <- sum((Y[index.treated]-Y[index.control])*weights)/sum(weights) cat("estimate for Y2:", mest, "\n") v1 <- Y[index.treated] - Y[index.control] varest <- sum( ((v1-mest)^2)*weights)/(sum(weights)*sum(weights)) se.naive <- sqrt(varest) cat("naive SE Y2:", se.naive, "\n") ############### Cheers, JS. ======================================= Jasjeet S. Sekhon Associate Professor Survey Research Center UC Berkeley http://sekhon.berkeley.edu/ V: 510-642-9974 F: 617-507-5524 ======================================= Brian Quinif writes: > To anyone who uses the Match() function in the Matching library... > > How do you go about deciding how many matches you will use? With my > data, my standard errors generally get smaller if I use more matches. > > Speaking of standard errors, when correcting for heteroscedasticity, > how many matches do you use (this is the Var.cal option). It seems to > me that it might make sense to use the same number of matches as > above, but that's just a guess... > > One more question about Match()... > I am calculating a number of SATT's that all have the same covariates > (X's) and treatment variables (Tr's). I would like to take advantage > of the matching that I do the first time to then quickly calculate the > SATT for various different Y's? How can I do that? It would save > serious computational time. > > In case I'm not explaining myself well, in the example below, I would > like to calculate satt.Y2 without having to perform the matching all > over again, since with more data, the process can be very slow. > > #make up some data > X <- matrix(rnorm(1000*5), ncol=5) > Tr <- c(rep(1,500),rep(0,500)) > Y1 <- as.vector(rnorm(1000)) > Y2 <- as.vector(rnorm(1000)) > > satt.Y1 <- Match(Y=Y1, X=X1, Tr=Tr, M=1) > satt.Y2 <- Match(Y=Y2, X=X1, Tr=Tr, M=1) > > Thanks, > > BQ > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
