(Sorry for obvious mistakes, as I am quite a newby with no Statistics background).
My question is going to be what is the gain of logistic regression over odds ratios when none of the input variables is continuous. My experiment: Outcome: ordinal scale, ``quality'' (QUA=1,2,3) Predictors: ``segment'' (SEG) and ``stress'' (STR). SEG is nominal scale with 24 levels, and STR is dychotomous (0,1). Considering the outcome continuous, two-way ANOVA with aov(as.integer(QUA) ~ SEG * STR) doesn't find evidence of interaction between SEG and STR, and they are significant on their own. This is the result that we would expect from clinical knowledge. I use xtabs(~QUA+SEG, data=data2.df, subset=STR==0) xtabs(~QUA+SEG, data=data2.df, subset=STR==0) for the contingency tables. There are zero cells, and for some values of SEG, there is only one none-zero cell, i.e. some values of SEG determine the output with certainty. So initially I was thinking of a proportional odds logistic regression model, but following Hosmer and Lemeshow [1], zero cells are problematic. So I take out of the data table the deterministic values of SEG, and I pool QUA=2 and QUA=3, and now I have a dychotomous outcome (QUA = Good/Bad) and no zero cells. The following model doesn't find evidence of interaction glm(QUA ~ STR * SEG, data=data3.df, family=binomial) so I go for glm(QUA ~ STR + SEG, data=data3.df, family=binomial) (I suppose that what glm does is to create design variables for SEG, where 0 0 ... 0 is for the first value of SEG, 1 0 ... 0 for the second value, 0 1 0 ... 0 for the third, etc). Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.085e+00 1.933e-01 -5.614 1.98e-08 *** STR.L 2.112e-01 6.373e-02 3.314 0.000921 *** SEGP2C.MI -9.869e-01 3.286e-01 -3.004 0.002669 ** SEGP2C.AI -1.306e+00 3.585e-01 -3.644 0.000269 *** SEGP2C.AA -1.743e+00 4.123e-01 -4.227 2.37e-05 *** [shortened] SEGP4C.ML -5.657e-01 2.990e-01 -1.892 0.058485 . SEGP4C.BL -2.908e-16 2.734e-01 -1.06e-15 1.000000 SEGSAX.MS 1.092e-01 2.700e-01 0.405 0.685772 SEGSAX.MAS -5.441e-16 2.734e-01 -1.99e-15 1.000000 SEGSAX.MA 7.130e-01 2.582e-01 2.761 0.005758 ** SEGSAX.ML 1.199e+00 2.565e-01 4.674 2.96e-06 *** SEGSAX.MP 1.313e+00 2.570e-01 5.108 3.26e-07 *** SEGSAX.MI 8.865e-01 2.569e-01 3.451 0.000558 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 3462.0 on 3123 degrees of freedom Residual deviance: 3012.6 on 3101 degrees of freedom AIC: 3058.6 Number of Fisher Scoring iterations: 6 Even though some coefficients have no evidence of statistical significance, the model requires them from a clinical point of view. At this point, the question would be how to interpret these results, and what advantage they offer over odds ratios. From [1] I can understand that in the case of a dychotomous and a continuous predictor, you can adjust for the continuous variable. But when all predictors are dychotomous (due to the design variables), I don't quite see the effect of adjustment. Wouldn't it be better just to split the data in two groups (STR=0 and STR=1), and instead of using logistic regression, use odds ratios for each value of SEG? Cheers, Ramón. [1] D.W. Hosmer and S. Lemeshow. ``Applied Logistic Regression''. John-Wiley. 2000. -- Ramón Casero Cañas web: http://www.robots.ox.ac.uk/~rcasero/ ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html