Re: [R] AIC, AICc, and K

Renaud Lancelot Mon, 06 Dec 2004 04:40:48 -0800

Martin Maechler a �crit :

"Spencer" == Spencer Graves <[EMAIL PROTECTED]>
   on Sat, 04 Dec 2004 17:09:26 -0800 writes:
Spencer> I don't know the "best" way, but the following looks like it will Spencer> work:
    Spencer> tstDF <- data.frame(x=1:3, y=c(1,1,2))
    Spencer> fit0 <- lm(y~1, tstDF)
    Spencer> fitDF <- lm(y~x, tstDF)
    Spencer> AIC(fitDF,fit0)
    Spencer> df      AIC
    Spencer> fitDF  3 5.842516
    Spencer> fit0   2 8.001399
Spencer> The function AIC with only 1 argument returns only Spencer> a single number. However, given nested models, it Spencer> returns a data.frame with colums df and AIC. At Spencer> least in this example (and I would think in all Spencer> other contexts as well), "df" is the K you want. yes, but Benjamin would hardly want to invent a second model just in order to call AIC() with both models and get the data frame...
As Benjamin hoped, the "df" is part of the logLik(.) result,
and if either of you had more carefully looked at  help(logLik),
you'd have seen
Value:
    Returns an object, say 'r', of class 'logLik' which is a number
    with attributes, 'attr(r, "df")' (*d*egrees of *f*reedom) giving
    the number of parameters in the model. There's a simple 'print'
    method for 'logLik' objects.
More directly, you can use
    > str(unclass(logLik(fit0)))
     atomic [1:1] -2
     - attr(*, "nall")= int 3
     - attr(*, "nobs")= int 3
     - attr(*, "df")= num 2
or
> stats:::AIC.logLik function (object, ..., k = 2) -2 * c(object) + k * attr(object, "df")
to see how these work.
Martin Maechler, ETH Zurich


[snip]

Yes, but see the problem recently outlined by Prof. Ripley about nobs:

On Sun, 31 Oct 2004, Prof Brian Ripley wrote:

The harder task is actually to get `n', not `npar'.
length(resid(x)) may or may not include missing values, depending on the na.action used, and will include observations with weight zero. However, logLik's "lm" method returns an attribute "nobs" that is a better choice.

But only better, as it looks like it has fallen into the first trap.
AFAICS, if 'fit' is an lm fit, then

n = df.residual(fit) + fit$rank


Quick check:

x <- rnorm(10); x[2] <- NA
y <- rnorm(10); y[1] <- NA
w <- c(rep(1,9), 0)
fit <- lm(y ~x, weights = w, na.action=na.exclude)
df.residual(fit) + fit$rank  # 7, correct
attributes(logLik(fit)) # nall 9 nobs 9 df 3
# but AIC fails
length(resid(fit)) # 10

fit <- lm(y ~x, weights = w, na.action=na.omit)
df.residual(fit) + fit$rank  # 7, correct
attributes(logLik(fit)) # nall 7 nobs 7 df 3
length(resid(fit)) # 8

The problem might also become a bit more complicated when dealing with other kind of models than lm (or weighted lm, etc).

For example, what is nobs when you are using the syntax:

cbind(success, failure) ~ covariates

in a binomial glm ? Is it the number of lines in the aggregated data frame, or the total number of observations ?

Best,

Renaud

--
Dr Renaud Lancelot, v�t�rinaire
C/0 Ambassade de France - SCAC
BP 834 Antananarivo 101 - Madagascar

e-mail: [EMAIL PROTECTED]
tel.:   +261 32 40 165 53 (cell)
        +261 20 22 665 36 ext. 225 (work)
        +261 20 22 494 37 (home)

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] AIC, AICc, and K

Reply via email to