Hi all

I'm getting a NaN returned on using dffits, as explained
below.  To me, there seems no obvious (or non-obvious reason
for that matter) reason why a  NaN  appears.

Before I start digging further, can anyone see why  dffits
might be failing?  Is there a problem with the data?


Consider:

# Load data
dep <- read.table("http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/factor1-R.dat";,
   header=TRUE)
attach(dep)
dep

# Fit Poisson glm
dep.glm2 <- glm( Counts ~ factor(Depression) + factor(SLE) + factor(Children) + factor(Depression):factor(SLE),
   family=poisson( link=log) )

# Compute dffits
dffits( dep.glm2 )


This produces the output:
1 2 3 4 5 6 1.4207746 -0.1513808 NaN 0.9079142 -0.1032664 -1.0860289
         7          8
0.4853797  3.8560863

NaN exists for Observation 3.  I cannot understand why: there's
nothing grossly unusual or bad about it.  I look a bit closer,
and it falls over in lm.influence when computing the deletion
statistic sigma:



> lm.influence(dep.glm2)$sigma
       1        2        3        4        5        6        7        8
0.914829 2.134279      NaN 2.186707 2.224885 1.934539 2.225115 1.957111

The rest of the results from lm.influence are OK; for example:

> lm.influence(dep.glm2)$wt.res
          1           2           3           4           5           6
 2.62840627 -0.88476903 -1.09492912  0.20247856 -0.23114458 -0.95123387
          7           8
 0.07521515  0.30208051


Use of debug( lm.influence ) indicates the NaN appears in this line
of lm.influence:



res <- .Fortran("lminfl", model$qr$qr, n, n, k, as.integer(do.coef),
model$qr$qraux, wt.res = e, hat = double(n), coefficients = if (do.coef) matrix(0, n, k) else double(0), sigma = double(n), tol = 10 * .Machine$double.eps,
    DUP = FALSE, PACKAGE = "base")[c("hat", "coefficients", "sigma",
    "wt.res")]


I don't particularly wish to dig around in the Fortran if someone
else can look at it and see my problem easily.  But if I must...



The appearance of the  NaN  seems odd, since (as I understand it)
  lm.influence(dep.glm2)$sigma  computes  sigma  when each observation
is removed in turn.  So if I remove Observation 3 and try fitting the
model, there are no problems or complaints:

dep.glm3 <- glm( Counts ~ factor(Depression) + factor(SLE) +
   factor(Children) + factor(Depression):factor(SLE),
   family=poisson( link=log), subset=(-3) )


This produces:


> dep.glm3

Call: glm(formula = Counts ~ factor(Depression) + factor(SLE) + factor(Children) + factor(Depression):factor(SLE), family = poisson(link = log), subset = (-3))

Coefficients:
                     (Intercept)               factor(Depression)1
                          5.4389                           -4.1392
                    factor(SLE)1                 factor(Children)1
                         -0.6503                           -2.4036
factor(Depression)1:factor(SLE)1
                          3.9513

Degrees of Freedom: 6 Total (i.e. Null);  2 Residual
Null Deviance:      695.9
Residual Deviance: 0.8535       AIC: 41.25


No problems, errors, or any signs of potential problems.


In the changes to R 2.3.0 (in the NEWS file,
eg http://mirror.aarnet.edu.au/pub/CRAN/src/base/NEWS)
I find this:

   o    Influence measures such as rstandard() and cooks.distance()
        could return infinite values rather than NaN for a case which
        was fitted exactly.  Similarly, plot.lm() could fail on such
        examples.  plot.lm(which = 5)  had to be modified to only plot
        cases with hat < 1.  (PR#8367)

        lm.influence() was incorrectly reporting 'coefficients' and
        'sigma' as NaN for cases with hat = 1, and on some platforms
        not detecting hat = 1 correctly.

The last sentence identifies NaN being reported for sigma, as I
find with my data.  But my data do not have hat = 1, but the hat
diagonals are large.  The troublesome Observation 3 does not have the
largest hat value in the data though:

> hatvalues(dep.glm2)
1 2 3 4 5 6 7 8 0.1689061 0.1064651 0.9098542 0.9030814 0.3799079 0.6382790 0.9327408 0.9607654

And besides, I am using the most recent version of R (see below).  BTW,
the NaNs appear in the previous version of R also.

So I'm flummoxed.

As always, help appreciated.

P.



> version
               _
platform       i386-pc-linux-gnu
arch           i386
os             linux-gnu
system         i386, linux-gnu
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)


--
Dr Peter Dunn  |  Email:  dunn <at> usq.edu.au
Faculty of Sciences, University of Southern Queensland
and the Australian Centre for Sustainable Catchments
CRICOS:  QLD 00244B |  NSW 02225M |  VIC 02387D |  WA 02521C
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to