Re: [R] nontabular logistic regression

Marc Schwartz Fri, 13 Oct 2006 09:20:06 -0700

On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote:
> Hi.  I'm attempting to fit a logistic/binomial model so I can determine
> the influence of landscape on the probability that a box gets used by a
> bird.  I've looked at a few sources (MASS text, Dalgaard, Fox and
> google) and the examples are almost always based on tabular predictor
> variables.  My data, however are not.  I'm not sure if that is the
> source of the problems or not because the one example that includes a
> continuous predictor looks to be coded exactly the same way.  Looking at
> the output, I get estimates for each case when I should get a single
> estimate for purbank.  Any suggestions?
> 
> Many thanks,
> 
> Jeff
> 
> 
> THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest
> are landscape variables).  
> 
> box   use     purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist       
> grasspatchk
> 1     1       0.003813435     0.02684564      0.06896552      0.3282487       
> 0.6845638       0.7586207       0       3.73
> 2     1       0.04429451      0.1610738       0.1724138       0.1534174       
> 0.3825503       0.6551724       0       1.023261
> 3     1       0.04458785      0.06040268      0       0.1628043       
> 0.557047        0.7586207       0       0.9605769
> 4     1       0.06072162      0.2080537       0.06896552      0.01936052      
> 0       0       323.1099        0.2284615
> 5     0       0.6080962       0.6979866       0.6896552       0.03168084      
> 0.1275168       0.2413793       30      0.2627027
> 6     1       0.6060428       0.6107383       0.3448276       0.04077442      
> 0.2885906       0.4482759       30      0.2978571
> 7     1       0.3807568       0.4362416       0.6896552       0.06864183      
> 0.03355705      0       94.86833        0.468
> 8     0       0.3649164       0.3154362       0.4137931       0.06277501      
> 0.1275168       0       120     0.4585714
> 
> THE CODE:
> 
> box.use<- read.csv("c:\\eabl\\2004\\use_logistic2.csv", header=TRUE)
> attach(box.use)
> box.use <- na.omit(box.use)
> use <- factor(use, levels=0:1)
> levels(use) <- c("unused", "used")
> glm1 <- glm(use ~ purbank, binomial)
> 
> THE OUTPUT:
> 
> Coefficients:
>                      Estimate Std. Error   z value Pr(>|z|)
> (Intercept)        -4.544e-16  1.414e+00 -3.21e-16    1.000
> purbank0            2.157e+01  2.923e+04     0.001    0.999
> purbank0.001173365  2.157e+01  2.067e+04     0.001    0.999
> purbank0.001466706  2.157e+01  2.923e+04     0.001    0.999
> purbank0.001760047  6.429e-16  2.000e+00  3.21e-16    1.000
> purbank0.002346729  2.157e+01  2.923e+04     0.001    0.999
> purbank0.003813435  2.157e+01  2.923e+04     0.001    0.999
> purbank0.004106776  2.157e+01  2.067e+04     0.001    0.999
> purbank0.004693458  2.157e+01  2.067e+04     0.001    0.999



It appears that the 'purbank' variable is being imported as a factor,
hence the multiple levels indicated in the left hand column. 

Check:

  str(box.use)

right after the read.csv() step and see what it shows.


>From the sample data above, it _should_ be along the lines of:

> str(box.use)
'data.frame':   8 obs. of  10 variables:
 $ box        : int  1 2 3 4 5 6 7 8
 $ use        : int  1 1 1 1 0 1 1 0
 $ purbank    : num  0.00381 0.04429 0.04459 0.06072 0.60810 ...
 $ purban2    : num  0.0268 0.1611 0.0604 0.2081 0.6980 ...
 $ purban1    : num  0.069 0.172 0.000 0.069 0.690 ...
 $ pgrassk    : num  0.3282 0.1534 0.1628 0.0194 0.0317 ...
 $ pgrass2    : num  0.685 0.383 0.557 0.000 0.128 ...
 $ pgrass1    : num  0.759 0.655 0.759 0.000 0.241 ...
 $ grassdist  : num    0   0   0 323  30 ...
 $ grasspatchk: num  3.730 1.023 0.961 0.228 0.263 ...


Hence, you should be able to use:

> model <- glm(use ~ purbank, data = box.use, family = binomial)

> summary(model)

Call:
glm(formula = use ~ purbank, family = binomial, data = box.use)

Deviance Residuals:
     Min        1Q    Median        3Q       Max
-1.61450  -0.03098   0.31935   0.45888   1.39194

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)    3.223      2.225   1.448    0.147
purbank       -6.129      4.773  -1.284    0.199

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8.9974  on 7  degrees of freedom
Residual deviance: 6.5741  on 6  degrees of freedom
AIC: 10.574

Number of Fisher Scoring iterations: 5


Note that na.omit() is the default operation for most R models, so is
redundant. Also, I would not attach the data frame, as you can use the
'data' argument in model related functions. This avoids the confusion of
having multiple copies of the source data set and wondering why changes
made can become confusing and problematic.

HTH,

Marc Schwartz

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nontabular logistic regression

Reply via email to