[R] Question about logistic regression with ordered factor variable using the rms package (prev.Design)

Tom Willems Tue, 11 Sep 2012 06:18:17 -0700

Dear R users,
Hopefully someone can help me, 
Maybe I just misunderstand the function in the package?
I am working with a logistic regression model.
Until now I always worked with the basic glm function, where for the model 
was:
¡§ glm( disease ~  test.value + cnct , family=binomial(link=¡¦logit¡¦) ¡¨.


This works fine when test .value and concentration (cnct) are continuous 
vairables.
However, concentration is in fact a grouping variable over 5 experiments 
with 5 concentrations
( 25, 50, 100, 200 & 400).

Therefore I believe concentration to be an ordered factor ( in model : 
cnct_o).
To make this model I used the ¡§rms¡¨ library (previously known as Design) 
and functions lrm (or Glm).
The lrm (or Glm) returns the odds for disease, the ¡§inv.logit (odds) ¡¨ 
gives the probability of disease, but I have to do this with the Predict 
function of the ¡§rms¡¨ package.

#####
The resulting model (with lrm or Glm) would be :
                Coef    S.E.    Wald Z  Pr(>|Z|)
Intercept       23.800  0.8891  2.68    0.0074
test.value      20.806  0.3409  6.10    <0.0001
cnct_o          -0.1127 0.0268  -4.21   <0.0001
cnct_o=100      77.393  17.542  4.41    <0.0001
cnct_o=200      204.291 45.080  4.53    <0.0001
cnct_o=400      427.829 98.180  4.36    <0.0001
#####
The results of the standard glm function  are very different :

#####
Standard glm
Deviance Residuals: 
    Min       1Q   Median       3Q      Max 
-2.7361  -0.2750   0.2177   0.5143   1.6897 

Coefficients:
            Estimate Std. Error z value Pr(>|z|) 
(Intercept)  -0.9022     0.3370  -2.677 0.007427 ** 
test.value    2.0806     0.3409   6.103 1.04e-09 ***
cnct_o.L      1.4363     0.4722   3.042 0.002352 ** 
cnct_o.Q      1.2208     0.4934   2.474 0.013359 * 
cnct_o.C     -2.0649     0.5610  -3.681 0.000232 ***
cnct_o^4      0.5599     0.4760   1.176 0.239485 
---
Signif. codes:  0 ¡¥***¡¦ 0.001 ¡¥**¡¦ 0.01 ¡¥*¡¦ 0.05 ¡¥.¡¦ 0.1 ¡¥ ¡¦ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 252.32  on 220  degrees of freedom
Residual deviance: 167.68  on 215  degrees of freedom
AIC: 179.68

Number of Fisher Scoring iterations: 6
#####

As I use the model parameters of the standard glm model to calculate the 
odds and probability manualy, I believe cnct_o = 25 to be the reference 
category and that cnct_o.L = 50, cnct_o.Q=100, cnct_o.C = 200 and 
cnct_o^4= 400. But I am not sure of this. The formulas used are :
Odds <-  intercept + slope * test.value + cnct_o  , where cnct_o is the 
corresponding value for the given concentration.
Probability <- inv.logit ( Odds ),  the function inv.logit from package 
¡§car¡¨.
The results of the glm are in the table below, which are first the odds 
and then the probability¡¦s ( inv.logit (odds)).

#####
glm odds        cntc: 
test.value      25              cnct_o.L        cnct_o.Q        cnct_o.C 
cnct_o^4 
0               -0.90218        0.534169        0.318649        -2.96706 
-0.34231
0.5             0.138132        1.574477        1.358957        -1.92675 
0.698001
1               1.17844         2.614785        2.399265        -0.88644 
1.738309
1.5             2.218747        3.655093        3.439572        0.153864 
2.778616
2               3.259055        4.6954          4.47988         1.194172 
3.818924
#####
glm prob.       cntc: 
test.value      25              cnct_o.L        cnct_o.Q        cnct_o.C 
cnct_o^4 
0               0.288604        0.630455        0.578995        0.048936 
0.415249
0.5             0.534478        0.828421        0.79559         0.127111 
0.667744
1               0.764667        0.931807        0.916771        0.291844 
0.850472
1.5             0.90192         0.974793        0.968919        0.53839 
0.941509
2               0.962997        0.990946        0.988792        0.767486 
0.97852
#####

If I compare this with the result of the Predict function in rms, the 
results seem very different, it can be because I misinterpret the glm 
model parameters for the ordered factor. How can I be sure which model 
parameter corresponds to which factor in the standard glm.

#####
Results of lrm:
Predict.lrm     cntc: 
test.value      25              50              100             200 400
0               -0,43815        -3,25628        -1,15323        0,264036 
0,072751
0,50580154      0,614227        -2,2039         -0,10085        1,316414 
1,125129
1,01160308      1,666606        -1,15153        0,951526        2,368793 
2,177508
1,505068        2,693316        -0,12482        1,978236        3,395503 
3,204218
2,01086954      3,745695        0,927563        3,030615        4,447882 
4,256597
#####
Prob. (inv.logit(odds)) cntc: 
test.value      25              50              100             200 400
0               0,392182        0,037102        0,239899        0,565628 
0,51818
0,50580154      0,648904        0,0994          0,474808        0,788585 
0,754939
1,01160308      0,841123        0,24021         0,721422        0,914416 
0,898211
1,505068        0,936631        0,468837        0,878493        0,967564 
0,960993
2,01086954      0,976926        0,716581        0,953938        0,988432 
0,986028
#####

One more problem occurs when I calculate the odds and probability 
manually, using the formulas as with the standard glm I get the following 
(very different) results: 

man.lrm.odds    cntc: 
test.value      25              50              100             200 400
0               2,37998         2,267255        10,11929        22,80909 
45,16286
0,5             3,420288        3,307563        11,1596         23,8494 
46,20316
1               4,460596        4,34787         12,19991        24,8897 
47,24347
1,5             5,500903        5,388178        13,24022        25,93001 
48,28378
2               6,541211        6,428486        14,28053        26,97032 
49,32409
man.lrm.prob 
test.value      25              50              100             200 400
0               0,915288        0,906129        0,99996 1       1
0,5             0,968333        0,964687        0,999986        1       1
1               0,988577        0,987231        0,999995        1       1
1,5             0,995934        0,995451        0,999998        1       1
2               0,998559        0,998388        0,999999        1       1
#####

It is clear to me that I am missing something essential about this lrm 
function.
Question: Are the model parameters for ordered factors converted in some 
way by from the lrm and Glm function?
»      Which could expain why I get a different result for the Predict 
function, then when I calculate it manually. 
Question: Is it correct that the first level of the ordered factor is used 
as reference category by the standard glm function? It there a way of 
giving them a different name, as it does automatically for the nominal 
factor?

Thank you for reading my questions,
And please share any insight that you have.
Kind regards,
Tom 
 


Disclaimer: click here 

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question about logistic regression with ordered factor variable using the rms package (prev.Design)

Reply via email to