[R] NaN when using dffits, stemming from lm.influence call

Peter Dunn Wed, 30 Aug 2006 19:17:22 -0700

Hi all

I'm getting a NaN returned on using dffits, as explained
below.  To me, there seems no obvious (or non-obvious reason
for that matter) reason why a  NaN  appears.


Before I start digging further, can anyone see why  dffits
might be failing?  Is there a problem with the data?


Consider:

# Load data

dep <-read.table("http://www.sci.usq.edu.au/staff/dunn/Datasets/Books/Hand/Hand-R/factor1-R.dat";,

   header=TRUE)
attach(dep)
dep

# Fit Poisson glm

dep.glm2 <- glm( Counts ~ factor(Depression) + factor(SLE) +factor(Children) + factor(Depression):factor(SLE),

   family=poisson( link=log) )

# Compute dffits
dffits( dep.glm2 )


This produces the output:

1 2 3 4 5 61.4207746 -0.1513808 NaN 0.9079142 -0.1032664 -1.0860289

         7          8
0.4853797  3.8560863

NaN exists for Observation 3.  I cannot understand why: there's
nothing grossly unusual or bad about it.  I look a bit closer,
and it falls over in lm.influence when computing the deletion
statistic sigma:



> lm.influence(dep.glm2)$sigma
       1        2        3        4        5        6        7        8
0.914829 2.134279      NaN 2.186707 2.224885 1.934539 2.225115 1.957111

The rest of the results from lm.influence are OK; for example:

> lm.influence(dep.glm2)$wt.res
          1           2           3           4           5           6
 2.62840627 -0.88476903 -1.09492912  0.20247856 -0.23114458 -0.95123387
          7           8
 0.07521515  0.30208051


Use of debug( lm.influence ) indicates the NaN appears in this line
of lm.influence:



res <- .Fortran("lminfl", model$qr$qr, n, n, k, as.integer(do.coef),

model$qr$qraux, wt.res = e, hat = double(n), coefficients = if(do.coef) matrix(0,n, k) else double(0), sigma = double(n), tol = 10 *.Machine$double.eps,

    DUP = FALSE, PACKAGE = "base")[c("hat", "coefficients", "sigma",
    "wt.res")]


I don't particularly wish to dig around in the Fortran if someone
else can look at it and see my problem easily.  But if I must...



The appearance of the  NaN  seems odd, since (as I understand it)
  lm.influence(dep.glm2)$sigma  computes  sigma  when each observation
is removed in turn.  So if I remove Observation 3 and try fitting the
model, there are no problems or complaints:

dep.glm3 <- glm( Counts ~ factor(Depression) + factor(SLE) +
   factor(Children) + factor(Depression):factor(SLE),
   family=poisson( link=log), subset=(-3) )


This produces:


> dep.glm3

Call: glm(formula = Counts ~ factor(Depression) + factor(SLE) +factor(Children) + factor(Depression):factor(SLE), family =poisson(link = log), subset = (-3))


Coefficients:
                     (Intercept)               factor(Depression)1
                          5.4389                           -4.1392
                    factor(SLE)1                 factor(Children)1
                         -0.6503                           -2.4036
factor(Depression)1:factor(SLE)1
                          3.9513

Degrees of Freedom: 6 Total (i.e. Null);  2 Residual
Null Deviance:      695.9
Residual Deviance: 0.8535       AIC: 41.25


No problems, errors, or any signs of potential problems.


In the changes to R 2.3.0 (in the NEWS file,
eg http://mirror.aarnet.edu.au/pub/CRAN/src/base/NEWS)
I find this:

   o    Influence measures such as rstandard() and cooks.distance()
        could return infinite values rather than NaN for a case which
        was fitted exactly.  Similarly, plot.lm() could fail on such
        examples.  plot.lm(which = 5)  had to be modified to only plot
        cases with hat < 1.  (PR#8367)

        lm.influence() was incorrectly reporting 'coefficients' and
        'sigma' as NaN for cases with hat = 1, and on some platforms
        not detecting hat = 1 correctly.

The last sentence identifies NaN being reported for sigma, as I
find with my data.  But my data do not have hat = 1, but the hat
diagonals are large.  The troublesome Observation 3 does not have the
largest hat value in the data though:

> hatvalues(dep.glm2)

1 2 3 4 5 6 780.1689061 0.1064651 0.9098542 0.9030814 0.3799079 0.6382790 0.93274080.9607654


And besides, I am using the most recent version of R (see below).  BTW,
the NaNs appear in the previous version of R also.

So I'm flummoxed.

As always, help appreciated.

P.



> version
               _
platform       i386-pc-linux-gnu
arch           i386
os             linux-gnu
system         i386, linux-gnu
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)


--
Dr Peter Dunn  |  Email:  dunn <at> usq.edu.au
Faculty of Sciences, University of Southern Queensland
and the Australian Centre for Sustainable Catchments
CRICOS:  QLD 00244B |  NSW 02225M |  VIC 02387D |  WA 02521C

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NaN when using dffits, stemming from lm.influence call

Reply via email to