Re: [R] Logistic regression problem: propensity score matching

2003-06-06 Thread Paul
Thank you all.

I made a pretty basic error in using multinom rather than glm 
family=binomial which needed rapid correction.

I have now rewritten the relevant part using glm.

After importing I convert all categorical variables into factors

londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
rownames=TRUE)
attach(londonpsm)
factor(londonpsm$InSample)
factor(londonpsm$GENDER)
factor(londonpsm$DISABLED)
factor(londonpsm$ETHCODE)
factor(londonpsm$LOPTYPE)
LonOutput <- glm(InSample ~ AGE + DISABLED + GENDER + ETHCODE + NDYPTOT 
+ NDLTUTOT + LOPTYPE, family = binomial)
lonoutput <- data.frame(fitted.values(LonOutput))
sqlSave(channel, lonoutput, tablename="lonoutput", safer=FALSE)

From the comments, this looks better, but it may be there is some 
further switch I should use.

Apologies for the variables in capitals - my data comes in SPSS format 
but to manipulate it I use Access, and the only way I can see to get 
data from SPSS to Access is to export it in a format such as dbase, 
which capitalises all variables.

While sqlFetch, sqlQuery and sqlSave seem to work amazingly well, and 
fast, I am still having a problem with my rownames. I would like the 
imported data to have the database unique ID as the rownames, and 
protect these through the analysis, so that the two columns in 
fitted.values are unique ID and the fitted value. So far this does not 
work.

Then, once the result has been sqlSaved, the inclusion of the unique ID 
enables matching of the resulting action and control sample with 
personal details for fieldwork, after the closest control match to 
action sample has been identified.

John Fox wrote:

Dear Paul,

At 08:41 PM 6/4/2003 +0100, Paul wrote:

Thanks for your reply.

I am using logistic regression because my response variable is 
categorical - and this seems to be recommended in the literature (by 
Heckman, Smith and others).


I think that Prof. Ripley's point here is that although one can use 
multnom in the nnet package to fit a binary (or binomial) logistic 
regression, it is more common to do so using the glm (generlized 
linear model) function. One normally would use multinomial logistic 
regression only for a polytomous (several-category) response variable. 
Applied to a dichotomous response, it will give the same results as a 
binary logistic regression.

. . .

I have MASS but was unable to locate logistic regression, which I was 
advised was the standard method for my problem.


In MASS (4th edition), logit models are discussed in chapter 7 on 
generalized linear models (see, in particular, section 7.2). In my R 
and S-PLUS Companion, to which you referred in your original message, 
these models are discussed in chapter 5 on generalized linear models 
(see, in particular, section 5.2.1).

I hope that this helps,
 John
Thanks again.

Prof Brian Ripley wrote:

1) Why are you using multinom when this is not a multinomial 
logistic regression?  You could just use a binomial glm.

2) The second argument to predict() is `newdata'.  `sample' is an R 
function, so what did you mean to have there?  I think the 
predictions should be a named vector if `sample' is a data frame.

3) There are many more examples of such things (and more 
explanation) in Venables & Ripley's MASS (the book).

On Wed, 4 Jun 2003, Paul Bivand wrote:



I am doing one part of an evaluation of a mandatory welfare-to-work 
programme in the UK.
As with all evaluations, the problem is to determine what would 
have happened if the initiative had not taken place.
In our case, we have a number of pilot areas and no possibility of 
random assignment.
Therefore we have been given control areas.
My problem is to select for survey individuals in the control areas 
who match as closely as possible the randomly selected sample of 
action area participants.
As I understand the methodology, the procedure is to run a logistic 
regression to determine the odds of a case being in the sample, 
across both action and control areas, and then choose for control 
sample the control area individual whose odds of being in the 
sample are closest to an actual sample member.

So far, I have following the multinomial logistic regression 
example in Fox's Companion to Applied Regression.
Firstly, I would like to know if the predict() is producing odds 
ratios (or probabilities) for being in the sample, which is what I 
am aiming for.


You asked for `probs', so you got probabilities.



Secondly, how do I get rownames (my unique identifier) into the 
output from predict() - my input may be faulty somehow and the 
wrong rownames being picked up - as I need to export back to 
database to sort and match in names, addresses and phone numbers 
for my selected samples.

My code is as follows:
londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
rownames=ORCID)
attach(londonpsm)
mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE 
+ NDYPTOT + NDLTUTOT + LOPTYPE)
lonoutput <- predict(mod.multinom, sample, 

Re: [R] Logistic regression problem: propensity score matching

2003-06-06 Thread John Fox
Dear Paul,

At 08:41 PM 6/4/2003 +0100, Paul wrote:
Thanks for your reply.

I am using logistic regression because my response variable is categorical 
- and this seems to be recommended in the literature (by Heckman, Smith 
and others).
I think that Prof. Ripley's point here is that although one can use multnom 
in the nnet package to fit a binary (or binomial) logistic regression, it 
is more common to do so using the glm (generlized linear model) function. 
One normally would use multinomial logistic regression only for a 
polytomous (several-category) response variable. Applied to a dichotomous 
response, it will give the same results as a binary logistic regression.

. . .

I have MASS but was unable to locate logistic regression, which I was 
advised was the standard method for my problem.
In MASS (4th edition), logit models are discussed in chapter 7 on 
generalized linear models (see, in particular, section 7.2). In my R and 
S-PLUS Companion, to which you referred in your original message, these 
models are discussed in chapter 5 on generalized linear models (see, in 
particular, section 5.2.1).

I hope that this helps,
 John
Thanks again.

Prof Brian Ripley wrote:

1) Why are you using multinom when this is not a multinomial logistic 
regression?  You could just use a binomial glm.

2) The second argument to predict() is `newdata'.  `sample' is an R 
function, so what did you mean to have there?  I think the predictions 
should be a named vector if `sample' is a data frame.

3) There are many more examples of such things (and more explanation) in 
Venables & Ripley's MASS (the book).

On Wed, 4 Jun 2003, Paul Bivand wrote:



I am doing one part of an evaluation of a mandatory welfare-to-work 
programme in the UK.
As with all evaluations, the problem is to determine what would have 
happened if the initiative had not taken place.
In our case, we have a number of pilot areas and no possibility of 
random assignment.
Therefore we have been given control areas.
My problem is to select for survey individuals in the control areas who 
match as closely as possible the randomly selected sample of action area 
participants.
As I understand the methodology, the procedure is to run a logistic 
regression to determine the odds of a case being in the sample, across 
both action and control areas, and then choose for control sample the 
control area individual whose odds of being in the sample are closest to 
an actual sample member.

So far, I have following the multinomial logistic regression example in 
Fox's Companion to Applied Regression.
Firstly, I would like to know if the predict() is producing odds ratios 
(or probabilities) for being in the sample, which is what I am aiming for.
You asked for `probs', so you got probabilities.



Secondly, how do I get rownames (my unique identifier) into the output 
from predict() - my input may be faulty somehow and the wrong rownames 
being picked up - as I need to export back to database to sort and match 
in names, addresses and phone numbers for my selected samples.

My code is as follows:
londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
rownames=ORCID)
attach(londonpsm)
mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE + 
NDYPTOT + NDLTUTOT + LOPTYPE)
lonoutput <- predict(mod.multinom, sample, type='probs')
london2 <- data.frame(lonoutput)

The Logistic regression seems to work, although summary() says the it is 
not a matrix.

what is `it'?



The output looks like odds ratios, but I would like to know whether this 
is so.

No.


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
-
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: [EMAIL PROTECTED]
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Logistic regression problem: propensity score matching

2003-06-05 Thread Paul
Thanks for your reply.

I am using logistic regression because my response variable is 
categorical - and this seems to be recommended in the literature (by 
Heckman, Smith and others).

The response variable is named sample because that is what it is - I am 
new to R so haven't quite got into habits of naming using Title Case.

Having selected a sample from the action area, randomly, the aim is to 
find people to survey who, if they had been in the action area, would 
have had as close odds of being in the samle as those actually selected.

Therefore I select a sample using sample(), write that out back to the 
access database as a new table, then (from R) run a query which creates 
a dataset comprising all those in the action area who could have been 
selected, and the control area group, and read this back in to R (using 
as many characteristics as possible except area) before undertaking the 
logistic regression. sample can take the values 0 (not in sample) or 1 
(n sample).

The aim is to find the odds of being in the sample (by characteristics) 
which is the Propensity Score, and then match action to control using 
Propensity Score Matching.

I have MASS but was unable to locate logistic regression, which I was 
advised was the standard method for my problem.

Thanks again.

Prof Brian Ripley wrote:

1) Why are you using multinom when this is not a multinomial logistic 
regression?  You could just use a binomial glm.

2) The second argument to predict() is `newdata'.  `sample' is an R 
function, so what did you mean to have there?  I think the predictions 
should be a named vector if `sample' is a data frame.

3) There are many more examples of such things (and more explanation) in 
Venables & Ripley's MASS (the book).

On Wed, 4 Jun 2003, Paul Bivand wrote:

 

I am doing one part of an evaluation of a mandatory welfare-to-work 
programme in the UK.
As with all evaluations, the problem is to determine what would have 
happened if the initiative had not taken place.
In our case, we have a number of pilot areas and no possibility of 
random assignment.
Therefore we have been given control areas.
My problem is to select for survey individuals in the control areas who 
match as closely as possible the randomly selected sample of action area 
participants.
As I understand the methodology, the procedure is to run a logistic 
regression to determine the odds of a case being in the sample, across 
both action and control areas, and then choose for control sample the 
control area individual whose odds of being in the sample are closest to 
an actual sample member.

So far, I have following the multinomial logistic regression example in 
Fox's Companion to Applied Regression.
Firstly, I would like to know if the predict() is producing odds ratios 
(or probabilities) for being in the sample, which is what I am aiming 
for. 
   

You asked for `probs', so you got probabilities.

 

Secondly, how do I get rownames (my unique identifier) into the 
output from predict() - my input may be faulty somehow and the wrong 
rownames being picked up - as I need to export back to database to sort 
and match in names, addresses and phone numbers for my selected samples.

My code is as follows:
londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
rownames=ORCID)
attach(londonpsm)
mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE + 
NDYPTOT + NDLTUTOT + LOPTYPE)
lonoutput <- predict(mod.multinom, sample, type='probs')
london2 <- data.frame(lonoutput)

The Logistic regression seems to work, although summary() says the it is 
not a matrix.
   

what is `it'?

 

The output looks like odds ratios, but I would like to know whether this 
is so.
   

No.

 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Logistic regression problem: propensity score matching

2003-06-04 Thread Prof Brian Ripley
1) Why are you using multinom when this is not a multinomial logistic 
regression?  You could just use a binomial glm.

2) The second argument to predict() is `newdata'.  `sample' is an R 
function, so what did you mean to have there?  I think the predictions 
should be a named vector if `sample' is a data frame.

3) There are many more examples of such things (and more explanation) in 
Venables & Ripley's MASS (the book).

On Wed, 4 Jun 2003, Paul Bivand wrote:

> I am doing one part of an evaluation of a mandatory welfare-to-work 
> programme in the UK.
> As with all evaluations, the problem is to determine what would have 
> happened if the initiative had not taken place.
> In our case, we have a number of pilot areas and no possibility of 
> random assignment.
> Therefore we have been given control areas.
> My problem is to select for survey individuals in the control areas who 
> match as closely as possible the randomly selected sample of action area 
> participants.
> As I understand the methodology, the procedure is to run a logistic 
> regression to determine the odds of a case being in the sample, across 
> both action and control areas, and then choose for control sample the 
> control area individual whose odds of being in the sample are closest to 
> an actual sample member.
> 
> So far, I have following the multinomial logistic regression example in 
> Fox's Companion to Applied Regression.
> Firstly, I would like to know if the predict() is producing odds ratios 
> (or probabilities) for being in the sample, which is what I am aiming 
> for. 

You asked for `probs', so you got probabilities.

> Secondly, how do I get rownames (my unique identifier) into the 
> output from predict() - my input may be faulty somehow and the wrong 
> rownames being picked up - as I need to export back to database to sort 
> and match in names, addresses and phone numbers for my selected samples.
> 
> My code is as follows:
> londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
> rownames=ORCID)
> attach(londonpsm)
> mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE + 
> NDYPTOT + NDLTUTOT + LOPTYPE)
> lonoutput <- predict(mod.multinom, sample, type='probs')
> london2 <- data.frame(lonoutput)
> 
> The Logistic regression seems to work, although summary() says the it is 
> not a matrix.

what is `it'?

> The output looks like odds ratios, but I would like to know whether this 
> is so.

No.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Logistic regression problem: propensity score matching

2003-06-04 Thread Paul
Hello all.

I am doing one part of an evaluation of a mandatory welfare-to-work 
programme in the UK.
As with all evaluations, the problem is to determine what would have 
happened if the initiative had not taken place.
In our case, we have a number of pilot areas and no possibility of 
random assignment.
Therefore we have been given control areas.
My problem is to select for survey individuals in the control areas who 
match as closely as possible the randomly selected sample of action area 
participants.
As I understand the methodology, the procedure is to run a logistic 
regression to determine the odds of a case being in the sample, across 
both action and control areas, and then choose for control sample the 
control area individual whose odds of being in the sample are closest to 
an actual sample member.

So far, I have following the multinomial logistic regression example in 
Fox's Companion to Applied Regression.
Firstly, I would like to know if the predict() is producing odds ratios 
(or probabilities) for being in the sample, which is what I am aiming 
for. Secondly, how do I get rownames (my unique identifier) into the 
output from predict() - my input may be faulty somehow and the wrong 
rownames being picked up - as I need to export back to database to sort 
and match in names, addresses and phone numbers for my selected samples.

My code is as follows:
londonpsm <- sqlFetch(channel, "London_NW_london_pilots_elig", 
rownames=ORCID)
attach(londonpsm)
mod.multinom <- multinom(sample ~ AGE + DISABLED + GENDER + ETHCODE + 
NDYPTOT + NDLTUTOT + LOPTYPE)
lonoutput <- predict(mod.multinom, sample, type='probs')
london2 <- data.frame(lonoutput)

The Logistic regression seems to work, although summary() says the it is 
not a matrix.
The output looks like odds ratios, but I would like to know whether this 
is so.

Thank you
Paul Bivand
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help