Re: [R] logistic regression

Mikhail Spivakov Thu, 26 Jun 2008 03:35:46 -0700

oh, I'm sorry...
here's my affiliation:

Mikhail Spivakov, PhD     [EMAIL PROTECTED]
Interdisciplinary Postdoctoral Fellow    http://www.ebi.ac.uk/~spivakov/
European Bioinformatics Institute   Tel: +44 1223 492660 (office)
Wellcome Trust Genome Campus        +44 7985 09 6675 (mob)
Cambridge CB10 1SD, UK


On Thu, Jun 26, 2008 at 7:50 AM, Prof Brian Ripley <[EMAIL PROTECTED]>
wrote:

> On Wed, 25 Jun 2008, Mikhail Spivakov wrote:
>
>  Sorry for flooding this forum, but I think I've realised that I need to do
>> multinomial logistic regression for my problem...
>> Would be interested in your opinion as to whether this is actually any
>> better than running three binomial logistic regressions separately..
>>
>
> Sometimes, depending on the problem.  We haven't seen a real-world problem
> in this thread, and (see the posting guide) this is not a statistical
> consultancy forum, so please ask your statsitical advisor for help.
>
> You won't get consultancy help here unless you give your affiliation in a
> proper signature block (as the posting guide says).
>
>
>
>> Thanks
>> M
>>
>> On Wed, Jun 25, 2008 at 1:17 AM, <[EMAIL PROTECTED]> wrote:
>>
>>  It looks like A*B*C*D is a complete, totally saturated model, (the
>>> residual deviance is effectively zero, and the residual degrees of
>>> freedom is exactly zero - this is a clue).  So when you try to put even
>>> more parameters into the model and even higher way interactions,
>>> something has to give.
>>>
>>> I find 3-factor interactions are about as much as I can think about
>>> without getting a bit giddy.  Do you really need 4- and 5-factor
>>> interactions?  If so, your only option is to get more data.
>>>
>>>
>>> Bill Venables
>>> CSIRO Laboratories
>>> PO Box 120, Cleveland, 4163
>>> AUSTRALIA
>>> Office Phone (email preferred): +61 7 3826 7251
>>> Fax (if absolutely necessary):  +61 7 3826 7304
>>> Mobile:                         +61 4 8819 4402
>>> Home Phone:                     +61 7 3286 7700
>>> mailto:[EMAIL PROTECTED]
>>> http://www.cmis.csiro.au/bill.venables/
>>>
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
>>> On Behalf Of Mikhail Spivakov
>>> Sent: Wednesday, 25 June 2008 9:31 AM
>>> To: r-help@r-project.org
>>> Subject: [R] logistic regression
>>>
>>>
>>> Hi everyone,
>>>
>>> I'm sorry if this turns out to be more a statistical question than one
>>> specifically about R - but would greatly appreciate your advice anyway.
>>>
>>> I've been using a logistic regression model to look at the relationship
>>> between a binary outcome (say, the odds of picking n white balls from a
>>> bag
>>> containing m balls in total) and a variety of other binary parameters:
>>>
>>> _________________________________________________________________
>>>
>>>  a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>>>> family=binomial(link="logit"))
>>>> summary(a.fit)
>>>>
>>>
>>> glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
>>> binomial(link = "logit"), data = a)
>>>
>>> Deviance Residuals:
>>>  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
>>>
>>> Coefficients:
>>>       Estimate        Std.    Error   z value Pr(>|z|)
>>> (Intercept)     -0.69751        0.02697 -25.861 <2.00E-16       ***
>>> A       -0.02911        0.05451 -0.534  0.593335
>>> B       0.39842 0.06871 5.798   6.70E-09        ***
>>> C       0.829   0.06745 12.29   <2.00E-16       ***
>>> D       0.05928 0.11133 0.532   0.594401
>>> A:B     -0.44053        0.13807 -3.191  0.001419        **
>>> A:C     -0.49596        0.13664 -3.63   0.000284        ***
>>> B:C     -0.62194        0.14164 -4.391  1.13E-05        ***
>>> A:D     -0.4031 0.2279  -1.769  0.076938        .
>>> B:D     -0.60238        0.25978 -2.319  0.020407        *
>>> C:D     -0.58467        0.27195 -2.15   0.031558        *
>>> A:B:C   0.5006  0.27364 1.829   0.067335        .
>>> A:B:D   0.51868 0.4683  1.108   0.268049
>>> A:C:D   0.32882 0.51226 0.642   0.520943
>>> B:C:D   0.56301 0.49903 1.128   0.259231
>>> A:B:C:D -0.32115        0.87969 -0.365  0.715059
>>>
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>> (Dispersion parameter for binomial family taken to be 1)
>>>
>>>   Null deviance: 2.2185e+02  on 15  degrees of freedom
>>> Residual deviance: 1.0385e-12  on  0  degrees of freedom
>>> AIC: 124.50
>>>
>>> Number of Fisher Scoring iterations: 3
>>>
>>> _________________________________________________________________
>>>
>>> This seems to produce sensible results given the actual data.
>>> However, there are actually three types of balls in the experiment and I
>>> need to model the relationship between the odds of picking each of the
>>> type
>>> and the parameters A,B,C,D. So what I do now is split the initial data
>>> table
>>> and just run glm three times:
>>>
>>>  all
>>>>
>>>
>>> [fictional data]
>>>
>>> TYPE WHITE ALL A B C D
>>> a       100     400     1       0       0       0
>>> b       200     600     1       0       0       0
>>> c       10      300     1       0       0       0
>>> ....
>>> a       30      100     1       1       1       1
>>> b       50      200     1       1       1       1
>>> c       20      120     1       1       1       1
>>>
>>>  a<-all[all$type=="a",]
>>>> b<-all[all$type=="b",]
>>>> c<-all[all$type=="c",]
>>>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>>>> family=binomial(link="logit"))
>>>> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>>>> family=binomial(link="logit"))
>>>> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>>>> family=binomial(link="logit"))
>>>>
>>>
>>> But it seems to me that I should be able to incorporate TYPE into the
>>> model.
>>>
>>> Something like:
>>>
>>>  summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W
>>>>
>>> HITE,ALL-WHITE)~TYPE*A*B*C*D))
>>>
>>> [please see the output below]
>>>
>>> However, when I do this, it does not seem to give an expected result.
>>> Is this not the right way to do it?
>>> Or this is actually less powerful than running the three models
>>> separately?
>>>
>>> Will greatly appreciate your advice!
>>>
>>> Many thanks
>>> Mikhail
>>>
>>> -----
>>>
>>>       Estimate        Std.    Error   z value Pr(>|z|)
>>> (Intercept)     -8.90E-01       1.91E-02        -46.553 <2.00E-16
>>> ***
>>> TYPE1   1.93E-01        2.47E-02        7.822   5.18E-15        ***
>>> TYPE2   1.19E+00        2.42E-02        49.108  <2.00E-16       ***
>>> A       1.89E-01        3.34E-02        5.665   1.47E-08        ***
>>> B       1.60E-01        4.41E-02        3.627   0.000286        ***
>>> C       2.24E-02        4.91E-02        0.455   0.64906
>>> D       1.96E-01        6.58E-02        2.982   0.002868        **
>>> TYPE1:A -2.19E-01       4.59E-02        -4.759  1.95E-06        ***
>>> TYPE2:A -9.08E-01       4.50E-02        -20.178 <2.00E-16       ***
>>> TYPE1:C 2.39E-01        5.93E-02        4.022   5.77E-05        ***
>>> TYPE2:B -1.82E+00       6.46E-02        -28.178 <2.00E-16       ***
>>> A:B     -2.26E-01       8.52E-02        -2.649  0.008066        **
>>> TYPE1:C 8.07E-01        6.27E-02        12.87   <2.00E-16       ***
>>> TYPE2:C -2.51E+00       7.83E-02        -32.039 <2.00E-16       ***
>>> A:C     -1.70E-01       9.51E-02        -1.783  0.074512        .
>>> B:C     -3.01E-01       1.12E-01        -2.698  0.006985        **
>>> TYPE1:D -1.37E-01       9.20E-02        -1.489  0.136548
>>> TYPE2:D -1.13E+00       9.19E-02        -12.329 <2.00E-16       ***
>>> A:D     -2.11E-01       1.27E-01        -1.655  0.097953        .
>>> B:D     -2.15E-01       1.55E-01        -1.387  0.165472
>>> C:D     -5.51E-01       2.76E-01        -1.997  0.045829        *
>>> TYPE1:A:B       -2.15E-01       1.17E-01        -1.84   0.065714
>>> .
>>>
>>>
>>> TYPE2:A:B       7.21E-01        1.28E-01        5.635   1.75E-08
>>> ***
>>> TYPE1:A:C       -3.26E-01       1.24E-01        -2.643  0.008221
>>> **
>>> TYPE2:A:C       9.70E-01        1.53E-01        6.36    2.02E-10
>>> ***
>>> TYPE1:B:C       -3.21E-01       1.38E-01        -2.321  0.020313
>>> *
>>> TYPE2:B:C       1.35E+00        1.89E-01        7.133   9.85E-13
>>> ***
>>> A:B:C   1.80E-01        2.11E-01        0.852   0.394425
>>> TYPE1:A:D       -1.92E-01       1.83E-01        -1.05   0.293758
>>> TYPE2:A:D       6.76E-01        1.80E-01        3.75    0.000177
>>> ***
>>> TYPE1:B:D       -3.87E-01       2.16E-01        -1.796  0.072443
>>> .
>>> TYPE2:B:D       1.09E+00        2.30E-01        4.709   2.49E-06
>>> ***
>>> A:B:D   1.92E-01        2.73E-01        0.702   0.482512
>>> TYPE1:C:D       -3.33E-02       3.18E-01        -0.105  0.916465
>>> TYPE2:C:D       1.20E-01        5.05E-01        0.238   0.811914
>>> A:C:D   -7.37E+00       1.74E+04        -4.23E-04       0.999663
>>> B:C:D   3.14E-01        4.92E-01        0.638   0.523254
>>> TYPE1:A:B:C     3.21E-01        2.64E-01        1.218   0.223336
>>> TYPE2:A:B:C     -8.43E-01       3.59E-01        -2.351  0.018747
>>> *
>>> TYPE1:A:B:D     3.27E-01        3.84E-01        0.85    0.3952
>>> TYPE2:A:B:D     -6.59E-01       4.08E-01        -1.617  0.105883
>>> TYPE1:A:C:D     7.69E+00        1.74E+04        4.42E-04        0.999648
>>>
>>> TYPE2:A:C:D     -1.60E+01       3.48E+04        -4.58E-04       0.999634
>>>
>>> TYPE1:B:C:D     2.49E-01        5.70E-01        0.437   0.662288
>>> TYPE2:B:C:D     -7.08E-01       8.97E-01        -0.789  0.430007
>>> A:B:C:D 9.08E-03        2.47E+04        3.67E-07        1
>>> TYPE1:A:B:C:D   -3.30E-01       2.47E+04        -1.34E-05       0.999989
>>> TYPE2:A:B:C:D   1.10E+00        4.94E+04        2.22E-05        0.999982
>>>
>>
>
> --
> Brian D. Ripley,                  [EMAIL PROTECTED]
> Professor of Applied Statistics,  
> http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/>
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

Reply via email to