[R] multinomial conditional logit models

2003-02-12 Thread John Hendrickx
A little while ago, I asked for some help in estimating multinomial
conditional logit models. If you restructure your data, you can
estimate a multinomial logit as a conditional logit model. This gives
flexibility in imposing constraints on your dependent variable. One
application is to estimate a loglinear model for square tables such
as quasi-independence or quasi-symmetry with covariates.

Anyhow, I think I've sorted out most of the problems and I've posted
a sample program at
http://www.xs4all.nl/~jhckx/mcl/R/

There's also a sample program there on how to estimate models for
square tables, aka mobility models. Comments and suggestions on how
to do things more efficiently or elegantly in R are most welcome.

One problem that remains is that clogit in R doesn't produce the same
estimates as other programs like Stata. I've estimated this sample
model in several packages and they all have the same estimates to 3
decimal places accuracy. I've tried different convergence settings in
clogit but to no effect. I'd appreciate it if anyone could clarify
this.

Best regards,
John Hendrickx

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] multinomial conditional logit models

2003-01-30 Thread Friedrich . Leisch
 On Wed, 29 Jan 2003 11:57:20 -0800 (PST),
 John Hendrickx (JH) wrote:

   A multinomial logit model can be specified as a conditional logit
   model after restructuring the data. Doing so gives flexibility in
   imposing restrictions on the dependent variable. One application is
   to specify a loglinear model for square tables, e.g. quasi-symmetry
   or quasi-independence, as a multinomial logit model with covariates.
   Further details on this technique and examples with several packages
   are on my homepage at http://www.xs4all.nl/~jhckx/mcl/

   I've been trying to implement an MCL model in R and have been mostly
   succesful. However I'm still stuck on a few points and I'm hoping
   someone can point out how to do fix them.

   To estimate an MCL model, the data must be restructured into a
   person-choice file. 

   * Each case must be duplicated ncat times (ncat is the number of
   categories of the dependent variable)
   * Each case must be identified by a strata variable (id)
   * Each duplicate must be identified by a variable indexing the
   choices (newy)
   * A dichotomous variable must indicate which duplicate corresponds
   with the respondent's choice (didep)

I'm not completely sure if I understand your target format completely,
but it sounds like reshape() could do all you want:


R df3
  id age dose1 dose2 dose4
1  1  40 1 2 3
2  2  50 2 1 3
3  3  60 1 2 3
4  4  50 2 1 3



R reshape(df3, direction = long, idvar=1,
   varying=list(dose=c(dose1, dose2, dose4)))

id age time dose1
1.1  1  401 1
2.1  2  501 2
3.1  3  601 1
4.1  4  501 2
1.2  1  402 2
2.2  2  502 1
3.2  3  602 2
4.2  4  502 1
1.3  1  403 3
2.3  2  503 3
3.3  3  603 3
4.3  4  503 3


which is a (modified) example from help(reshape) ... there you find
much more examples.

Hope this helps,
Fritz

-- 
---
Friedrich Leisch 
Institut für Statistik  DSSTel: (+43 1) 4277 38613
Universität WienFax: (+43 1) 4277 38639
Universitätsstraße 5  [EMAIL PROTECTED]
A-1010 Wien, Austria http://www.ci.tuwien.ac.at/~leisch
---

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] multinomial conditional logit models

2003-01-30 Thread John Hendrickx
Thanks, that did the trick. My mclgen function can be written as

mclgen - function (datamat,catvar) {
ncat - nlevels(catvar)
perschoice-as.data.frame(rep(datamat,ncat))
perschoice-reshape(perschoice,direction=long,
varying=lapply(names(datamat),rep,ncat),
timevar=newy)
perschoice-perschoice[sort.list(perschoice$id),]
dep-parse(text=paste(perschoice$,substitute(catvar),sep=))
perschoice$depvar-as.numeric(eval(dep)==perschoice$newy)
perschoice
}

I'd still appreciate the help of R-listers on how to use the value of
catvar on the left-hand side of an expression within a function
(perschoice$valueof(catvar)-perschoice$newy), and how to get R to
drop the reference category of a factor in an interaction effect
without the main effect (in
clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)).

But many thanks for the help so far!

John Hendrickx

--- Thomas Lumley [EMAIL PROTECTED] wrote:
 On Thu, 30 Jan 2003, John Hendrickx wrote:
 
  The only problem though is the last step, specifying a list of
 varying
  variables. It should be possible to generate this from
 names(dset)
  but I can't get it right. as.list(rep(names(dset),2)) produces
 a
  list with four elements rather than a list with 2 components each
  containing two elements. dim() gives a matrix as a result.
 Could
  someone show me how to create
 list(x1=c(X1,X1),x2=c(X2,X2))
  from names(dset)?
 
lapply(names(dset),rep,2)
 gives
  [[1]]
 [1] X1 X1
 
 [[2]]
 [1] X2 X2
 
   -thomas
 
 __
 [EMAIL PROTECTED] mailing list
 http://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] multinomial conditional logit models

2003-01-30 Thread ripley
On Thu, 30 Jan 2003, John Hendrickx wrote:

 Thanks, that did the trick. My mclgen function can be written as
 
 mclgen - function (datamat,catvar) {
   ncat - nlevels(catvar)
   perschoice-as.data.frame(rep(datamat,ncat))
   perschoice-reshape(perschoice,direction=long,
   varying=lapply(names(datamat),rep,ncat),
   timevar=newy)
   perschoice-perschoice[sort.list(perschoice$id),]
   dep-parse(text=paste(perschoice$,substitute(catvar),sep=))
   perschoice$depvar-as.numeric(eval(dep)==perschoice$newy)
   perschoice
 }
 
 I'd still appreciate the help of R-listers on how to use the value of
 catvar on the left-hand side of an expression within a function
 (perschoice$valueof(catvar)-perschoice$newy), and how to get R to

Use perschoice[[catvar]], on either side.

 drop the reference category of a factor in an interaction effect
 without the main effect (in
 clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)).

Let me guess a lot what you mean. You are using treatment contrasts (hence 
the `reference category') and you want the model matrix which is
that given by depvar ~ occ + edu + occ:educ + black + occ:black less the 
columns for edu and black?  That's a model that depends on the details of 
the coding, and does not make sense in the Wilkinson-Rogers framework.
Could you spell this out please?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] multinomial conditional logit models

2003-01-29 Thread John Hendrickx
A multinomial logit model can be specified as a conditional logit
model after restructuring the data. Doing so gives flexibility in
imposing restrictions on the dependent variable. One application is
to specify a loglinear model for square tables, e.g. quasi-symmetry
or quasi-independence, as a multinomial logit model with covariates.
Further details on this technique and examples with several packages
are on my homepage at http://www.xs4all.nl/~jhckx/mcl/

I've been trying to implement an MCL model in R and have been mostly
succesful. However I'm still stuck on a few points and I'm hoping
someone can point out how to do fix them.

To estimate an MCL model, the data must be restructured into a
person-choice file. 

* Each case must be duplicated ncat times (ncat is the number of
categories of the dependent variable)
* Each case must be identified by a strata variable (id)
* Each duplicate must be identified by a variable indexing the
choices (newy)
* A dichotomous variable must indicate which duplicate corresponds
with the respondent's choice (didep)

I've done this as follows:

mclgen - function (datamat,catvar) {
ncat - nlevels(catvar)
id-1:length(catvar)
datamat-cbind(id,datamat)
perschoice -NULL
for (i in 1:ncat) {
perschoice-rbind(perschoice,datamat)
}
perschoice-perschoice[sort.list(perschoice$id),]
perschoice$newy-gl(ncat,1,length=nrow(perschoice))
dep-parse(text=paste(perschoice$,substitute(catvar),sep=))
perschoice$depvar-as.numeric(eval(dep)==perschoice$newy)
perschoice
}

This works but I wonder if it's the most efficient method. I tried
using rep rather than rbind in a loop but that replicated the
dataset horizontally, not vertically. Is this the best solution?

I also finally figure out how to include the argument catvar in a
transformation involving another dataset but this solution seems very
complicated (eval+parse+substitute). Is there a simpler solution?

What I haven't figured out is how to use catvar once again but on
the righthand side of the transformation. In the second to last line,
I'd like to do perschoice$catvar-perschoice$newy. I've tried
several possibities but I can't figure out how to let R substitute
the value of catvar in that expression. eval(dep) doesn't work as
it does for the lefthand side but how should it be done? 

My solution for now is to do it afterward. My program continues as
follows:

pc-mclgen(logan,occ)
pc$occ-factor(pc$newy)
library(survival)
cl.lr-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)
summary(cl.lr)

The dataset logan is available from the website. occ is the
dependent variable, educ and black are independents. Once
mclgen has transformed the data into a person-choice file, a
multinomial logit model can be estimated using the dichotomous
dependent variable, the main effects of occ for the intercept term
and interactions of educ and black with occ for the effects of
these variables. 

This works more or less. What's unexpected though is that the
reference category of occ isn't dropped in the interactions with
educ and black. The highest category is dropped due to
collineairity but that's not quite what I want. Could someone explain
why this happens and how to let R drop the first category?

A final problem is that the results of clogit are only approximate (2
digits) whereas in other packages the cl model produces exactly the
same estimates as an mnl model. Is this because terms are dropped due
to collinearity or are there options I should examine. clogit issues
the following warnings which seem to be related to the collinearity
but perhaps point to something else:

Warning messages: 
1: Loglik converged before variable  10 ; beta may be infinite.  in:
fitter(X, Y, strats, offset, init, control, weights = weights,  
2: X matrix deemed to be singular; variable 9 14 in: coxph(formula =
Surv(rep(1, 4190), depvar) ~ occ + occ:educ +  

Apologies for the length of this post. Hopefully someone will have
time to read it through and help me out. As always, advance thanks
for any assistance.

John Hendrickx

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help