[R] multinomial conditional logit models
A little while ago, I asked for some help in estimating multinomial conditional logit models. If you restructure your data, you can estimate a multinomial logit as a conditional logit model. This gives flexibility in imposing constraints on your dependent variable. One application is to estimate a loglinear model for square tables such as quasi-independence or quasi-symmetry with covariates. Anyhow, I think I've sorted out most of the problems and I've posted a sample program at http://www.xs4all.nl/~jhckx/mcl/R/ There's also a sample program there on how to estimate models for square tables, aka mobility models. Comments and suggestions on how to do things more efficiently or elegantly in R are most welcome. One problem that remains is that clogit in R doesn't produce the same estimates as other programs like Stata. I've estimated this sample model in several packages and they all have the same estimates to 3 decimal places accuracy. I've tried different convergence settings in clogit but to no effect. I'd appreciate it if anyone could clarify this. Best regards, John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multinomial conditional logit models
On Wed, 29 Jan 2003 11:57:20 -0800 (PST), John Hendrickx (JH) wrote: A multinomial logit model can be specified as a conditional logit model after restructuring the data. Doing so gives flexibility in imposing restrictions on the dependent variable. One application is to specify a loglinear model for square tables, e.g. quasi-symmetry or quasi-independence, as a multinomial logit model with covariates. Further details on this technique and examples with several packages are on my homepage at http://www.xs4all.nl/~jhckx/mcl/ I've been trying to implement an MCL model in R and have been mostly succesful. However I'm still stuck on a few points and I'm hoping someone can point out how to do fix them. To estimate an MCL model, the data must be restructured into a person-choice file. * Each case must be duplicated ncat times (ncat is the number of categories of the dependent variable) * Each case must be identified by a strata variable (id) * Each duplicate must be identified by a variable indexing the choices (newy) * A dichotomous variable must indicate which duplicate corresponds with the respondent's choice (didep) I'm not completely sure if I understand your target format completely, but it sounds like reshape() could do all you want: R df3 id age dose1 dose2 dose4 1 1 40 1 2 3 2 2 50 2 1 3 3 3 60 1 2 3 4 4 50 2 1 3 R reshape(df3, direction = long, idvar=1, varying=list(dose=c(dose1, dose2, dose4))) id age time dose1 1.1 1 401 1 2.1 2 501 2 3.1 3 601 1 4.1 4 501 2 1.2 1 402 2 2.2 2 502 1 3.2 3 602 2 4.2 4 502 1 1.3 1 403 3 2.3 2 503 3 3.3 3 603 3 4.3 4 503 3 which is a (modified) example from help(reshape) ... there you find much more examples. Hope this helps, Fritz -- --- Friedrich Leisch Institut für Statistik DSSTel: (+43 1) 4277 38613 Universität WienFax: (+43 1) 4277 38639 Universitätsstraße 5 [EMAIL PROTECTED] A-1010 Wien, Austria http://www.ci.tuwien.ac.at/~leisch --- __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multinomial conditional logit models
Thanks, that did the trick. My mclgen function can be written as mclgen - function (datamat,catvar) { ncat - nlevels(catvar) perschoice-as.data.frame(rep(datamat,ncat)) perschoice-reshape(perschoice,direction=long, varying=lapply(names(datamat),rep,ncat), timevar=newy) perschoice-perschoice[sort.list(perschoice$id),] dep-parse(text=paste(perschoice$,substitute(catvar),sep=)) perschoice$depvar-as.numeric(eval(dep)==perschoice$newy) perschoice } I'd still appreciate the help of R-listers on how to use the value of catvar on the left-hand side of an expression within a function (perschoice$valueof(catvar)-perschoice$newy), and how to get R to drop the reference category of a factor in an interaction effect without the main effect (in clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)). But many thanks for the help so far! John Hendrickx --- Thomas Lumley [EMAIL PROTECTED] wrote: On Thu, 30 Jan 2003, John Hendrickx wrote: The only problem though is the last step, specifying a list of varying variables. It should be possible to generate this from names(dset) but I can't get it right. as.list(rep(names(dset),2)) produces a list with four elements rather than a list with 2 components each containing two elements. dim() gives a matrix as a result. Could someone show me how to create list(x1=c(X1,X1),x2=c(X2,X2)) from names(dset)? lapply(names(dset),rep,2) gives [[1]] [1] X1 X1 [[2]] [1] X2 X2 -thomas __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] multinomial conditional logit models
On Thu, 30 Jan 2003, John Hendrickx wrote: Thanks, that did the trick. My mclgen function can be written as mclgen - function (datamat,catvar) { ncat - nlevels(catvar) perschoice-as.data.frame(rep(datamat,ncat)) perschoice-reshape(perschoice,direction=long, varying=lapply(names(datamat),rep,ncat), timevar=newy) perschoice-perschoice[sort.list(perschoice$id),] dep-parse(text=paste(perschoice$,substitute(catvar),sep=)) perschoice$depvar-as.numeric(eval(dep)==perschoice$newy) perschoice } I'd still appreciate the help of R-listers on how to use the value of catvar on the left-hand side of an expression within a function (perschoice$valueof(catvar)-perschoice$newy), and how to get R to Use perschoice[[catvar]], on either side. drop the reference category of a factor in an interaction effect without the main effect (in clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)). Let me guess a lot what you mean. You are using treatment contrasts (hence the `reference category') and you want the model matrix which is that given by depvar ~ occ + edu + occ:educ + black + occ:black less the columns for edu and black? That's a model that depends on the details of the coding, and does not make sense in the Wilkinson-Rogers framework. Could you spell this out please? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multinomial conditional logit models
A multinomial logit model can be specified as a conditional logit model after restructuring the data. Doing so gives flexibility in imposing restrictions on the dependent variable. One application is to specify a loglinear model for square tables, e.g. quasi-symmetry or quasi-independence, as a multinomial logit model with covariates. Further details on this technique and examples with several packages are on my homepage at http://www.xs4all.nl/~jhckx/mcl/ I've been trying to implement an MCL model in R and have been mostly succesful. However I'm still stuck on a few points and I'm hoping someone can point out how to do fix them. To estimate an MCL model, the data must be restructured into a person-choice file. * Each case must be duplicated ncat times (ncat is the number of categories of the dependent variable) * Each case must be identified by a strata variable (id) * Each duplicate must be identified by a variable indexing the choices (newy) * A dichotomous variable must indicate which duplicate corresponds with the respondent's choice (didep) I've done this as follows: mclgen - function (datamat,catvar) { ncat - nlevels(catvar) id-1:length(catvar) datamat-cbind(id,datamat) perschoice -NULL for (i in 1:ncat) { perschoice-rbind(perschoice,datamat) } perschoice-perschoice[sort.list(perschoice$id),] perschoice$newy-gl(ncat,1,length=nrow(perschoice)) dep-parse(text=paste(perschoice$,substitute(catvar),sep=)) perschoice$depvar-as.numeric(eval(dep)==perschoice$newy) perschoice } This works but I wonder if it's the most efficient method. I tried using rep rather than rbind in a loop but that replicated the dataset horizontally, not vertically. Is this the best solution? I also finally figure out how to include the argument catvar in a transformation involving another dataset but this solution seems very complicated (eval+parse+substitute). Is there a simpler solution? What I haven't figured out is how to use catvar once again but on the righthand side of the transformation. In the second to last line, I'd like to do perschoice$catvar-perschoice$newy. I've tried several possibities but I can't figure out how to let R substitute the value of catvar in that expression. eval(dep) doesn't work as it does for the lefthand side but how should it be done? My solution for now is to do it afterward. My program continues as follows: pc-mclgen(logan,occ) pc$occ-factor(pc$newy) library(survival) cl.lr-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc) summary(cl.lr) The dataset logan is available from the website. occ is the dependent variable, educ and black are independents. Once mclgen has transformed the data into a person-choice file, a multinomial logit model can be estimated using the dichotomous dependent variable, the main effects of occ for the intercept term and interactions of educ and black with occ for the effects of these variables. This works more or less. What's unexpected though is that the reference category of occ isn't dropped in the interactions with educ and black. The highest category is dropped due to collineairity but that's not quite what I want. Could someone explain why this happens and how to let R drop the first category? A final problem is that the results of clogit are only approximate (2 digits) whereas in other packages the cl model produces exactly the same estimates as an mnl model. Is this because terms are dropped due to collinearity or are there options I should examine. clogit issues the following warnings which seem to be related to the collinearity but perhaps point to something else: Warning messages: 1: Loglik converged before variable 10 ; beta may be infinite. in: fitter(X, Y, strats, offset, init, control, weights = weights, 2: X matrix deemed to be singular; variable 9 14 in: coxph(formula = Surv(rep(1, 4190), depvar) ~ occ + occ:educ + Apologies for the length of this post. Hopefully someone will have time to read it through and help me out. As always, advance thanks for any assistance. John Hendrickx __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help