Dear R users,
Hopefully someone can help me,
Maybe I just misunderstand the function in the package?
I am working with a logistic regression model.
Until now I always worked with the basic glm function, where for the model
was:
¡§ glm( disease ~ test.value + cnct , family=binomial(link=¡¦logit¡¦) ¡¨.
This works fine when test .value and concentration (cnct) are continuous
vairables.
However, concentration is in fact a grouping variable over 5 experiments
with 5 concentrations
( 25, 50, 100, 200 & 400).
Therefore I believe concentration to be an ordered factor ( in model :
cnct_o).
To make this model I used the ¡§rms¡¨ library (previously known as Design)
and functions lrm (or Glm).
The lrm (or Glm) returns the odds for disease, the ¡§inv.logit (odds) ¡¨
gives the probability of disease, but I have to do this with the Predict
function of the ¡§rms¡¨ package.
#####
The resulting model (with lrm or Glm) would be :
Coef S.E. Wald Z Pr(>|Z|)
Intercept 23.800 0.8891 2.68 0.0074
test.value 20.806 0.3409 6.10 <0.0001
cnct_o -0.1127 0.0268 -4.21 <0.0001
cnct_o=100 77.393 17.542 4.41 <0.0001
cnct_o=200 204.291 45.080 4.53 <0.0001
cnct_o=400 427.829 98.180 4.36 <0.0001
#####
The results of the standard glm function are very different :
#####
Standard glm
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7361 -0.2750 0.2177 0.5143 1.6897
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9022 0.3370 -2.677 0.007427 **
test.value 2.0806 0.3409 6.103 1.04e-09 ***
cnct_o.L 1.4363 0.4722 3.042 0.002352 **
cnct_o.Q 1.2208 0.4934 2.474 0.013359 *
cnct_o.C -2.0649 0.5610 -3.681 0.000232 ***
cnct_o^4 0.5599 0.4760 1.176 0.239485
---
Signif. codes: 0 ¡¥***¡¦ 0.001 ¡¥**¡¦ 0.01 ¡¥*¡¦ 0.05 ¡¥.¡¦ 0.1 ¡¥ ¡¦ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 252.32 on 220 degrees of freedom
Residual deviance: 167.68 on 215 degrees of freedom
AIC: 179.68
Number of Fisher Scoring iterations: 6
#####
As I use the model parameters of the standard glm model to calculate the
odds and probability manualy, I believe cnct_o = 25 to be the reference
category and that cnct_o.L = 50, cnct_o.Q=100, cnct_o.C = 200 and
cnct_o^4= 400. But I am not sure of this. The formulas used are :
Odds <- intercept + slope * test.value + cnct_o , where cnct_o is the
corresponding value for the given concentration.
Probability <- inv.logit ( Odds ), the function inv.logit from package
¡§car¡¨.
The results of the glm are in the table below, which are first the odds
and then the probability¡¦s ( inv.logit (odds)).
#####
glm odds cntc:
test.value 25 cnct_o.L cnct_o.Q cnct_o.C
cnct_o^4
0 -0.90218 0.534169 0.318649 -2.96706
-0.34231
0.5 0.138132 1.574477 1.358957 -1.92675
0.698001
1 1.17844 2.614785 2.399265 -0.88644
1.738309
1.5 2.218747 3.655093 3.439572 0.153864
2.778616
2 3.259055 4.6954 4.47988 1.194172
3.818924
#####
glm prob. cntc:
test.value 25 cnct_o.L cnct_o.Q cnct_o.C
cnct_o^4
0 0.288604 0.630455 0.578995 0.048936
0.415249
0.5 0.534478 0.828421 0.79559 0.127111
0.667744
1 0.764667 0.931807 0.916771 0.291844
0.850472
1.5 0.90192 0.974793 0.968919 0.53839
0.941509
2 0.962997 0.990946 0.988792 0.767486
0.97852
#####
If I compare this with the result of the Predict function in rms, the
results seem very different, it can be because I misinterpret the glm
model parameters for the ordered factor. How can I be sure which model
parameter corresponds to which factor in the standard glm.
#####
Results of lrm:
Predict.lrm cntc:
test.value 25 50 100 200 400
0 -0,43815 -3,25628 -1,15323 0,264036
0,072751
0,50580154 0,614227 -2,2039 -0,10085 1,316414
1,125129
1,01160308 1,666606 -1,15153 0,951526 2,368793
2,177508
1,505068 2,693316 -0,12482 1,978236 3,395503
3,204218
2,01086954 3,745695 0,927563 3,030615 4,447882
4,256597
#####
Prob. (inv.logit(odds)) cntc:
test.value 25 50 100 200 400
0 0,392182 0,037102 0,239899 0,565628
0,51818
0,50580154 0,648904 0,0994 0,474808 0,788585
0,754939
1,01160308 0,841123 0,24021 0,721422 0,914416
0,898211
1,505068 0,936631 0,468837 0,878493 0,967564
0,960993
2,01086954 0,976926 0,716581 0,953938 0,988432
0,986028
#####
One more problem occurs when I calculate the odds and probability
manually, using the formulas as with the standard glm I get the following
(very different) results:
man.lrm.odds cntc:
test.value 25 50 100 200 400
0 2,37998 2,267255 10,11929 22,80909
45,16286
0,5 3,420288 3,307563 11,1596 23,8494
46,20316
1 4,460596 4,34787 12,19991 24,8897
47,24347
1,5 5,500903 5,388178 13,24022 25,93001
48,28378
2 6,541211 6,428486 14,28053 26,97032
49,32409
man.lrm.prob
test.value 25 50 100 200 400
0 0,915288 0,906129 0,99996 1 1
0,5 0,968333 0,964687 0,999986 1 1
1 0,988577 0,987231 0,999995 1 1
1,5 0,995934 0,995451 0,999998 1 1
2 0,998559 0,998388 0,999999 1 1
#####
It is clear to me that I am missing something essential about this lrm
function.
Question: Are the model parameters for ordered factors converted in some
way by from the lrm and Glm function?
» Which could expain why I get a different result for the Predict
function, then when I calculate it manually.
Question: Is it correct that the first level of the ordered factor is used
as reference category by the standard glm function? It there a way of
giving them a different name, as it does automatically for the nominal
factor?
Thank you for reading my questions,
And please share any insight that you have.
Kind regards,
Tom
Disclaimer: click here
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.