[R] [R-pkgs] New version of catspec package

2005-04-13 Thread John Hendrickx
I've uploaded a new version of catspec to CRAN. Catspec is for
estimating certain special categorical models. It also contains
ctab, a function for creating one-way, two-way, and multi-way
percentage tables (nothing special there really). Ctab can now print
more than one percentage type, as well as table marginals.

The first special model in catspec is mclgen. Mclgen restructures a
data frame so a multinomial logistic model can be estimated using a
condition logit program. Doing so provides much more flexibility for
imposing restrictions on the response variable.

The second special (set of) models is sqtab, for estimating
loglinear models for square tables (aka mobility models) such as
symmetry, quasi-independence. One application can be to specify such
models as multinomial logistic models with covariates, using mclgen.

John Hendrickx

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] [R-pkgs] perturb package for evaluating collinearity

2005-04-01 Thread John Hendrickx
I've uploaded the R package perturb to CRAN. Perturb contains two
programs for evaluating collinearity. Colldiag calculates condition
indexes and variance decomposition proportions to detect and track
down collinear sets of variables. 

Perturb takes a different approach. It re-estimates the model a
specified number of times, adding random noise (perturbations) to
selected variables at each iteration. Categorical variables are
randomly reclassified. A summary then displays the impact of these
perturbations on the stability of the coefficients. Perturb is not
limited to linear regression and can handle interaction effects and
transformations of independent variables.

Version 1 of perturb was available from my website. This new version
makes it much easier to reclassify categorical variables. In
addition, colldiag now has a fuzz option to suppress printing of
small variance decomposition values.

Comments and criticisms invited,
John Hendrickx

___
R-packages mailing list
[EMAIL PROTECTED]
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Ever see a stata import problem like this?

2004-09-22 Thread John Hendrickx
I've had a similar problem once. What may have caused the problem
then was a variate for which value lables had been defined for the
highest and lowest values. What complicates things is that the file
had been originally converted from SPSS to Stata. A workaround was to
set convert.factor=FALSE and that seems to work here too (using R
1.91 and the latest update for foreign): 

 m2-read.dta(morgen.dta,convert.factors=FALSE)
 summary(m2)
 CASEID  yearidhrs1 
 Min.   :   19721   Min.   :1972   Min.   :   1   Min.   :0.00  
 1st Qu.: 1983475   1st Qu.:1978   1st Qu.: 445   1st Qu.:   37.00  
 Median : 1996808   Median :1987   Median : 905   Median :   40.00  
 Mean   : 9963040   Mean   :1986   Mean   : 990   Mean   :   41.05  
 3rd Qu.:19872187   3rd Qu.:1994   3rd Qu.:1358   3rd Qu.:   48.00  
 Max.   :20002817   Max.   :2000   Max.   :3247   Max.   :   89.00  
  NA's   :17654.00  
  hrs2 prestigeagewed  age   

 Min.   :0.00   Min.   :   12.00   Min.   :   12.00   Min.   :
18.00  
 1st Qu.:   38.00   1st Qu.:   30.00   1st Qu.:   19.00   1st Qu.:
30.00  
 Median :   40.00   Median :   39.00   Median :   21.00   Median :
42.00  
 Mean   :   39.79   Mean   :   39.36   Mean   :   22.10   Mean   :
45.15  
 3rd Qu.:   45.00   3rd Qu.:   48.00   3rd Qu.:   24.00   3rd Qu.:
58.00  
 Max.   :   89.00   Max.   :   82.00   Max.   :   73.00   Max.   :
89.00  
 NA's   :40159.00   NA's   :1.00   NA's   :15551.00   NA's  
:143.00  
  educpaeduc maeducspeduc
   
 Min.   :  0.00   Min.   :0.00   Min.   :   0.00   Min.   :   
0.00  
 1st Qu.: 11.00   1st Qu.:8.00   1st Qu.:   8.00   1st Qu.:  
12.00  
 Median : 12.00   Median :   11.00   Median :  12.00   Median :  
12.00  
 Mean   : 12.48   Mean   :   10.21   Mean   :  10.41   Mean   :  
12.53  
 3rd Qu.: 14.00   3rd Qu.:   12.00   3rd Qu.:  12.00   3rd Qu.:  
14.00  
 Max.   : 20.00   Max.   :   20.00   Max.   :  20.00   Max.   :  
20.00  
 NA's   :127.00   NA's   :11586.00   NA's   :6782.00   NA's  
:18153.00  
 income
 Min.   :   1.000  
 1st Qu.:   9.000  
 Median :  11.000  
 Mean   :   9.756  
 3rd Qu.:  12.000  
 Max.   :  13.000  
 NA's   :3453.000  
 


--- Paul Johnson [EMAIL PROTECTED] wrote:

 Greetings Everybody:
 
 I generated a 1.2MB dta file based on the general social survey
 with 
 Stata8 for linux. The file can be re-opened with Stata, but when I
 bring 
 it into R, it says all the values are missing for most of the
 variables.
 
 This dataset is called morgen.dta and I dropped a copy online in
 case 
 you are interested
 
 http://www.ku.edu/~pauljohn/R/morgen.dta
 
[snip]

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] mutlicollinearity and MM-regression

2004-08-16 Thread John Hendrickx
--- [EMAIL PROTECTED] wrote:

 Dear R users,
 
 Usually the variance-inflation factor, which is based on R^2, is
 used as a
 measure for multicollinearity. But, in contrast to OLS regression
 there is
 no robust R^2 available for MM-regressions in R. Do you know if an
 equivalent or an alternative nmeasure of multicollinearity is
 available for
 MM-regression in R?
 
I'm not sure what MM-regression is. But I've just put a general
purpose tool for evaluating collinearity on my website. See
http://www.xs4all.nl/~jhckx/R/perturb/

The perturb programs works by adding small random changes
(perturbations) to selected variables. Categorical variables are
randomly misclassified. This process is repeated a specified number
of times, after which the impact of the perturbations on parameter
stability can be evaluated. It should work with any R-procedure that
has a formula.

The package also contains colldiag, for calculating condition indexes
and variance decomposition proportions. Since this only works on the
independent variables, it should work for your problem as well.

Feedback welcomed. I plan to submit the package to CRAN in a few
days, after I get the help files updated

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] converting character strings to eval

2004-07-20 Thread John Hendrickx
--- Deepayan Sarkar [EMAIL PROTECTED] wrote:
 On Monday 19 July 2004 09:37, Wayne Jones wrote:
  Hi there fellow R-users,
 
  I'm stuck on this seemingly trivial problem.
 
  All I want to coerce a character string into a command.
 
  For example:
 
  x-rnorm(20)
  y-rnorm(20)
  str-lm(y~x)
 
  I want to evaluate the str command.
 
  I have tried
 
  eval(as.expression(str))
 
 eval(parse(text = str))
 
 seems to work.
 
Couldn't eval be modified to automatically parse arguments if they're
not expressions? Something like:
eval2-function(arg) {
if (!is.expression(arg)) arg-parse(text=arg)
eval(arg)
}
Would a construction like eval2 have a downside or cause problems
down the line?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] indexing a parameter in nls

2004-07-15 Thread John Hendrickx
I'm trying to estimate a diagonal reference model in nls. In its
basic form, the model consists of two factors with equal categories
and a dependent variable. The model fits the main effects of the two
factors such that they are proportional to each other. So the model I
want to fit is:

Y=p*m[f1]+(1-p)*m[f2]

The m-parameters indicate the average main effect of factors f1 and
f2 (with no intercept). The p-parameter indicates the relative impact
of the two factors on Y. If p=.5, f1 and f2 have equal impact, for
p.5, f1 has greater impact, for p.5 f2 has greater impact. 

I can estimate the model as follows for f1=row, f2=col,
Y=nkids:

rd-model.matrix(~as.factor(row)-1)
cd-model.matrix(~as.factor(col)-1)

strt-list(m1=2,m2=2,m3=2,m4=2,m5=3,p=.5)

nls(nkids~p*(rd[,1]*m1+rd[,2]*m2+rd[,3]*m3+rd[,4]*m4+rd[,5]*m5) +
  (1-p)*(cd[,1]*m1+cd[,2]*m2+cd[,3]*m3+cd[,4]*m4+cd[,5]*m5),
  start=strt)
---

What I want however, is a more compact specification, where row and
col are used to index which m-parameter should be used, something
like:

nls(nkids~p*strt[row]+(1-p)*strt[col],start=strt)

This specification produces an error: Error in nlsModel(formula, mf,
start) : singular gradient matrix at initial parameter estimates. I
suspect the problem is that strt[row] evaluates to a list of
starting values rather than the object names m1, m2, etc. I've tried

nls(nkids~p*get(names(strt[row]))+(1-p)*get(names(strt[col])),start=strt)

But this produces an error as well: Error in qr.qty(QR, resid) : qr
and y must have the same number of rows

Can anyone point me in the right direction? How can I use row and
col to index the appropriate m parameter in this model?

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] best editor for .R files

2003-11-21 Thread John Hendrickx
An extensive review of text editors for different platforms is at
http://fmwww.bc.edu/repec/bocode/t/textEditors.html
It's geared to Stata users but contains relevant information for R
users as well.

Personally, I use TextPad for Windows. It has syntax highlighting,
regular expression search and replace, selection of rectangular
blocks  of text. You can run programs from within TextPad and if the
output file has been opened it's automatically updated if you submit
the syntax again, a very useful feature I found.

Good luck,
John Hendrickx

--- [EMAIL PROTECTED] wrote:
 On the Windows platform, I use SourceEdit which also has R syntax 
 highlighting (and much more) (http://www.brixoft.com/). However, as
 an 
 organization, we do have plans to look at the IDE Eclipse 
 (http://www.eclipse.org/ )
 
 Partha
 
 
 
 
 
 Angel [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 11/20/2003 04:27 PM
 
  
 To: [EMAIL PROTECTED]
 cc: 
 Subject:[R] best editor for .R files
 
 
 
 Which is the best editor for .R files?
 
 I currently use kate on my linux as it has R highlighting and
 allows me to
 split the window into two: in one I edit the .R file and in the
 other I 
 have
 a shell so I run R and can easily  copy and paste the code. There
 are some
 features that I don't like and I am having a look on some
 alternatives.
 I've heard wonders of emacs with ess but I am a little bit
 frightened of 
 the
 steep learning curve.
 
 What do the R experts use or would recommend using?
 Both linux and/or windows alternatives are welcomed.
 I guess it would much depend on the particular needs/preferences of
 each
 user but I would like to know which are the most commonly used
 editors.
 Thanks,
 Angel
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 --
 This message has been scanned for viruses and
 dangerous content by MailScanner, and is
 believed to be clean.
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__

Free Pop-Up Blocker - Get it now

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] (multiway) percentage tables

2003-02-28 Thread John Hendrickx
R has amazing capabilities, but percentage tables are a weak spot
IMHO. There's prop.table but that's rather unwieldly, especially for
multiway tables. CrossTable by Marc Schwartz in the gregmisc library
makes percentage tables a breeze but is limited to two-way tables. So
I decided to try my own hand at writing an R-function that would make
it easy to produce nicely formatted percentage tables for one-way,
two-way, or multi-way tables.

The first argument for ctab can be either a table object or one or
more factors. The first variable is assumed to be the row variable,
the second the column variable, subsequent variables are grouping
variables. The type option can be used to specify percentage type
(n, row, column, or total), digits to specify the number of
decimal points, percentage=FALSE can be used to print proportions
rather than percentages. row.vars and col.vars are passed on to
ftables for formatting multiway tables.

I'd like to see something like ctab in R-base at some point in the
future. Perhaps it could be integrated in ftable? Perhaps I'll try
that myself as a next project. I'm still learning R so comments on
ctab are most welcome.

Best,
John Hendrickx

examples of usage 
source(ctab.R)
data(Titanic)
ctab(Titanic)
ctab(Titanic,type=r)
ctab(Titanic,row.vars=1:3,type=r)
ctab(Titanic,col.vars=c(2,4),type=r)
ctab(Titanic,row.vars=c(1,3),type=c)
ctab(Titanic,col.vars=c(Sex,Survived),type=r)
ctab(Titanic,col.vars=c(Sex,Survived),type=t)
-- ctab --
# ctab: oneway, twoway, multiway percentage tables
# first argument must consist of one or more factors
# or a table object (class table, xtabs, or ftable)
# digits: number of digits after the decimal (default 2)
# type: n for counts, row, column or total
# for percentages (default n)
# row.vars:
# col.vars: same usage as ftable, ignored for one- and
# two-way tables
# percentages: FALSE== proportions are presented rather
# than percentages (default TRUE)

# comments to John Hendrickx [EMAIL PROTECTED]

ctab-function(...,digits=2,
type=c(n, row, column, total),
row.vars=NULL, col.vars=NULL,
percentages=TRUE) {
if (attributes(...)$class==factor) {
# create a table if the arguments are factors
tbl-table(...)
}
else if (table %in% class(...) || class(...)==ftable) {
# the argument is a table object (table, xtabs, ftable)
tbl-eval(...)
}
else {
stop(first argument must be either factors or a table object)
}

type-match.arg(type)

# one dimensional table,restrict choices to n and total
if (length(dim(tbl))==1) {
type-ifelse(type==n,n,total)
}

# if the object is an ftable, use the row.vars and col.vars
# use numeric indices to avoid finding the omitted
# the object must be converted to a table to get the dimensions
right
if (class(tbl)==ftable) {
nrowvar-length(names(attr(tbl,row.vars)))
row.vars-1:nrowvar
col.vars-(1:length(names(attr(tbl,col.vars+nrowvar
tbl-as.table(tbl)
}

# marginals to exclude assuming first factor is the row vaariable,
# second factor is the column variable
# is overridden by row.vars or col.vars
mrg2drop-0
if (type==column) {mrg2drop-1}
if (type==row) {mrg2drop-2}
if (type==total  length(dim(tbl))  1) {mrg2drop-c(1,2)}


# use row.vars and col.vars to determine the
# marginals to use when calculating percentages
# start by translating names to variable positions
nms-names(dimnames(tbl))
if (!is.null(row.vars)  !is.numeric(row.vars)) {
row.vars-order(match(nms,row.vars),na.last=NA)
}
if (!is.null(col.vars)  !is.numeric(col.vars)) {
col.vars-order(match(nms,col.vars),na.last=NA)
}
# calculate the other if only one is given
if (!is.null(row.vars)  is.null(col.vars)) {
col.vars-(1:length(dim(tbl)))[-row.vars]
}
if (!is.null(col.vars)  is.null(row.vars)) {
row.vars-(1:length(dim(tbl)))[-col.vars]
}
# now determine the margin as the last element
if (type==row  !is.null(col.vars)) {
mrg2drop-col.vars[length(col.vars)]
}
if (type==column  !is.null(row.vars)) {
mrg2drop-row.vars[length(row.vars)]
}
# if row.vars is given, col.vars has been determined
if (type==total  !is.null(row.vars)) {
mrg2drop-c(col.vars[length(col.vars)],row.vars[length(row.vars)])
}

marg-(1:length(dim(tbl)))[(-mrg2drop)]

# create percentages
if (type==n) {
digits-0
}
else

Re: [R] models for square tables

2003-02-13 Thread John Hendrickx
Labels can be helpful but can also cause messy output. In some of the
other functions, I added an option to suppress printing labels,
especially for symmetrical interactions.

A problem (in my view) with your approach is that it sorts the
categories according to the level names. I'd prefer the following,
which would make labels optional but otherwise maintain the original
order.

mob.qi - function
(rowvar,colvar,constrained=FALSE,print.labels=FALSE) {
check.square(rowvar,colvar)
if (constrained) {
qi - ifelse(rowvar==colvar, 1, 0)
nms-c(diagonal)
}
else {
qi - ifelse(rowvar==colvar, rowvar, 0)
nms-levels(rowvar)
}

qi-factor(qi)
qi-C(qi,contr.treatment,base=1)
if (print.labels) {
levels(qi)-c(offdiag,nms)
}
qi
}

Thanks for your suggestions and for pointing out relevel to me!

John Hendrickx

--- David Firth [EMAIL PROTECTED] wrote:
 Nice.
 
 I guess I normally do things a little bit differently, to get 
 coefficient names that look a bit more meaningful (eg avoiding
 numeric 
 codes for factor levels).
 
 For example one possible adjustment to your code for the qi models 
 would be
 
 mob.qi - function(rowvar, colvar, constrained=FALSE) {
   check.square(rowvar, colvar)
   if (!constrained) {
   qi - ifelse(rowvar==colvar, as.character(rowvar), offdiag)
   relevel(factor(qi), offdiag)
   } else qi - ifelse(rowvar==colvar, 1, 0)
 }
 
 immobile - mob.qi(OccFather, OccSon)
 glm.qi2-glm(Freq ~ OccFather + OccSon + immobile,
 family=poisson())
 
 immobile - mob.qi(OccFather, OccSon, constrained=T)
 glm.q02 -glm(Freq ~ OccFather + OccSon + immobile,
 family=poisson())
 
 Cheers,
 David
 
 On Wednesday, Feb 12, 2003, at 13:01 Europe/London, John Hendrickx 
 wrote:
 
  I've posted a sample file for estimating loglinear models for
 square
  tables (mobility models) at http://www.xs4all.nl/~jhckx/mcl/R/
  Comments and suggestions are welcome.
 
  John Hendrickx
 
  __
  [EMAIL PROTECTED] mailing list
  http://www.stat.math.ethz.ch/mailman/listinfo/r-help
 


__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] multinomial conditional logit models

2003-02-12 Thread John Hendrickx
A little while ago, I asked for some help in estimating multinomial
conditional logit models. If you restructure your data, you can
estimate a multinomial logit as a conditional logit model. This gives
flexibility in imposing constraints on your dependent variable. One
application is to estimate a loglinear model for square tables such
as quasi-independence or quasi-symmetry with covariates.

Anyhow, I think I've sorted out most of the problems and I've posted
a sample program at
http://www.xs4all.nl/~jhckx/mcl/R/

There's also a sample program there on how to estimate models for
square tables, aka mobility models. Comments and suggestions on how
to do things more efficiently or elegantly in R are most welcome.

One problem that remains is that clogit in R doesn't produce the same
estimates as other programs like Stata. I've estimated this sample
model in several packages and they all have the same estimates to 3
decimal places accuracy. I've tried different convergence settings in
clogit but to no effect. I'd appreciate it if anyone could clarify
this.

Best regards,
John Hendrickx

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] models for square tables

2003-02-12 Thread John Hendrickx
I've posted a sample file for estimating loglinear models for square
tables (mobility models) at http://www.xs4all.nl/~jhckx/mcl/R/
Comments and suggestions are welcome.

John Hendrickx

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] multinomial conditional logit models

2003-01-30 Thread John Hendrickx
Thanks, that did the trick. My mclgen function can be written as

mclgen - function (datamat,catvar) {
ncat - nlevels(catvar)
perschoice-as.data.frame(rep(datamat,ncat))
perschoice-reshape(perschoice,direction=long,
varying=lapply(names(datamat),rep,ncat),
timevar=newy)
perschoice-perschoice[sort.list(perschoice$id),]
dep-parse(text=paste(perschoice$,substitute(catvar),sep=))
perschoice$depvar-as.numeric(eval(dep)==perschoice$newy)
perschoice
}

I'd still appreciate the help of R-listers on how to use the value of
catvar on the left-hand side of an expression within a function
(perschoice$valueof(catvar)-perschoice$newy), and how to get R to
drop the reference category of a factor in an interaction effect
without the main effect (in
clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)).

But many thanks for the help so far!

John Hendrickx

--- Thomas Lumley [EMAIL PROTECTED] wrote:
 On Thu, 30 Jan 2003, John Hendrickx wrote:
 
  The only problem though is the last step, specifying a list of
 varying
  variables. It should be possible to generate this from
 names(dset)
  but I can't get it right. as.list(rep(names(dset),2)) produces
 a
  list with four elements rather than a list with 2 components each
  containing two elements. dim() gives a matrix as a result.
 Could
  someone show me how to create
 list(x1=c(X1,X1),x2=c(X2,X2))
  from names(dset)?
 
lapply(names(dset),rep,2)
 gives
  [[1]]
 [1] X1 X1
 
 [[2]]
 [1] X2 X2
 
   -thomas
 
 __
 [EMAIL PROTECTED] mailing list
 http://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] multinomial conditional logit models

2003-01-29 Thread John Hendrickx
A multinomial logit model can be specified as a conditional logit
model after restructuring the data. Doing so gives flexibility in
imposing restrictions on the dependent variable. One application is
to specify a loglinear model for square tables, e.g. quasi-symmetry
or quasi-independence, as a multinomial logit model with covariates.
Further details on this technique and examples with several packages
are on my homepage at http://www.xs4all.nl/~jhckx/mcl/

I've been trying to implement an MCL model in R and have been mostly
succesful. However I'm still stuck on a few points and I'm hoping
someone can point out how to do fix them.

To estimate an MCL model, the data must be restructured into a
person-choice file. 

* Each case must be duplicated ncat times (ncat is the number of
categories of the dependent variable)
* Each case must be identified by a strata variable (id)
* Each duplicate must be identified by a variable indexing the
choices (newy)
* A dichotomous variable must indicate which duplicate corresponds
with the respondent's choice (didep)

I've done this as follows:

mclgen - function (datamat,catvar) {
ncat - nlevels(catvar)
id-1:length(catvar)
datamat-cbind(id,datamat)
perschoice -NULL
for (i in 1:ncat) {
perschoice-rbind(perschoice,datamat)
}
perschoice-perschoice[sort.list(perschoice$id),]
perschoice$newy-gl(ncat,1,length=nrow(perschoice))
dep-parse(text=paste(perschoice$,substitute(catvar),sep=))
perschoice$depvar-as.numeric(eval(dep)==perschoice$newy)
perschoice
}

This works but I wonder if it's the most efficient method. I tried
using rep rather than rbind in a loop but that replicated the
dataset horizontally, not vertically. Is this the best solution?

I also finally figure out how to include the argument catvar in a
transformation involving another dataset but this solution seems very
complicated (eval+parse+substitute). Is there a simpler solution?

What I haven't figured out is how to use catvar once again but on
the righthand side of the transformation. In the second to last line,
I'd like to do perschoice$catvar-perschoice$newy. I've tried
several possibities but I can't figure out how to let R substitute
the value of catvar in that expression. eval(dep) doesn't work as
it does for the lefthand side but how should it be done? 

My solution for now is to do it afterward. My program continues as
follows:

pc-mclgen(logan,occ)
pc$occ-factor(pc$newy)
library(survival)
cl.lr-clogit(depvar~occ+occ:educ+occ:black+strata(id),data=pc)
summary(cl.lr)

The dataset logan is available from the website. occ is the
dependent variable, educ and black are independents. Once
mclgen has transformed the data into a person-choice file, a
multinomial logit model can be estimated using the dichotomous
dependent variable, the main effects of occ for the intercept term
and interactions of educ and black with occ for the effects of
these variables. 

This works more or less. What's unexpected though is that the
reference category of occ isn't dropped in the interactions with
educ and black. The highest category is dropped due to
collineairity but that's not quite what I want. Could someone explain
why this happens and how to let R drop the first category?

A final problem is that the results of clogit are only approximate (2
digits) whereas in other packages the cl model produces exactly the
same estimates as an mnl model. Is this because terms are dropped due
to collinearity or are there options I should examine. clogit issues
the following warnings which seem to be related to the collinearity
but perhaps point to something else:

Warning messages: 
1: Loglik converged before variable  10 ; beta may be infinite.  in:
fitter(X, Y, strats, offset, init, control, weights = weights,  
2: X matrix deemed to be singular; variable 9 14 in: coxph(formula =
Surv(rep(1, 4190), depvar) ~ occ + occ:educ +  

Apologies for the length of this post. Hopefully someone will have
time to read it through and help me out. As always, advance thanks
for any assistance.

John Hendrickx

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help