hi,
thanks for the reply to my query about exclusion rules for propensity score matching.
Exclusion can be based on the non-overlap regions from the propensity. It should not be done in the individual covariate space.
i want a rule inspired by non-overlap in propensity score space, but that binds in the space of the Xs. because i don't really know how to interpret the fact that i've excluded, say, people with scores > .87, but i DO know what it means to say that i've excluded people from country XYZ over age Q because i can't find good matches for them. if i make my rule based on Xs, i know who i can and cannot make inference for, and i can explain to other people who are the units that i can and cannot make inference for.
after posting to the list last night, i thought of using the RGENOUD package (genetic algorithm) to search over the space of exclusion rules (eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function associated with a rule should be increasing in # of tr units w/out support excluded and decreasing in # of tr units w/ support excluded.
it might be tricky to get the right loss function, and i know this idea is kind of nutty, but it's the only automated search method i could think of.
any comments?
alexis
Use the X space directly will not result in optimum exclusions unless you use a distance function but that will make assumptions. My advice is to use rpart to make a classification rule that approximates the exclusion criteria to some desired degree of accuracy. I.e. use rpart to predict propensity < lower cutoff and separately to predict propensity > upper cutoff. This just assists in interpretation.
Frank
I tend to look at the 10th smallest and largest values of propensity for each of the two treatment groups for making the decision. You will need to exclude non-overlap regions whether you use matching or covariate adjustment of propensity but covariate adjustment (using e.g. regression splines in the logit of propensity) is often a better approach once you've been careful about non-overlap.
Frank Harrell
On Tue, 5 Apr 2005, Frank E Harrell Jr wrote:
[EMAIL PROTECTED] wrote:
Dear R-list,
i have 6 different sets of samples. Each sample has about 5000 observations, with each observation comprised of 150 baseline covariates (X), 125 of which are dichotomous. Roughly 20% of the observations in each sample are "treatment" and the rest are "control" units.
i am doing propensity score matching, i have already estimated propensity scores(predicted probabilities) using logistic regression, and in each sample i am going to have to exclude approximately 100 treated observations for which I cannot find matching control observations (because the scores for these treated units are outside the support of the scores for control units).
in each sample, i must identify an exclusion rule that is interpretable on the scale of the X's that excludes these unmatchable treated observations and excludes as FEW of the remaining treated observations as possible. (the reason is that i want to be able to explain, in terms of the Xs, who the individuals are that I making causal inference about.)
i've tried some simple stuff over the past few days and nothing's worked. is there an R-package or algorithm, or even estimation strategy that anyone could recommend? (i am really hoping so!)
thank you,
alexis diamond
-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html