[R] possible bug in "sspir" package?

Mark Scheuerell Fri, 29 May 2009 14:00:10 -0700

Greetings,

I sent the message below to the developer of the contributed R package"sspir", but have yet to receive any response. I would be very gratefulfor any advice people have on the matter.


Thanks,
Mark


-------- Original Message --------
Subject:        possible bug in sspir?
Date:   Tue, 19 May 2009 16:08:41 -0700
From:   Mark Scheuerell <[email protected]>
To:     [email protected], [email protected], [email protected]



Hi Claus,

I have been using the "sspir" package that you developed for use with Rand have found it very useful. Previously, all of the time seriesmodels I had been fitting had gaussian errors, and everything seemed towork quite well.

Recently, however, I became interested in some models for time series ofsurvival probabilities. Given the response variable is a proportion thatlies on [0,1], it seemed reasonable to fit a model with a binomial errorstructure and logit link function. It is my understanding that if I wereto fit a similar model with "glm", the response variable can take one of3 forms: (1) a vector with 2 levels (eg, 0 and 1) if the data are fromindividuals; (2) a vector of the proportions in which case theadditional function call "weights" would be needed with a vector of thenumber of trials; or (3) a 2xN matrix where the columns are successesand failures, respectively.

When I try to fit a basic random-walk model using the "ssm" function,however, I can only get option (3) above to work. That in itself is OK,but it seems as though ssm and subsequent functions (eg, "kfs") want towork on a Nx2 matrix instead. For example, the following lines shouldproduce an estimated state vector of length=20, but the result is length=2:


>y <- ts(cbind(rpois(20,20), rpois(20,100)), start=1, freq=1)
>ssm.fit <- ssm(t(y) ~ tvar(1), family=binomial(link="logit"), fit=FALSE)
>ssm.ex1.fit <- extended(ssm.ex1$ss)
>ssm.ex1.fit$m
Time Series:
Start = 1
End = 2
Frequency = 1
      Series 1
[1,] -0.0498745
[2,] -0.1357787


If I try to trick the code and pass in the transpose of y, I get this error:

>y <- t(y)
>ssm.fit <- ssm(y ~ tvar(1), family=binomial(link="logit"), fit=FALSE)
>ssm.ex1.fit <- extended(ssm.ex1$ss)
Error in ss$Fmat(tt, ss$x, ss$phi) : subscript out of bounds

When I went in and looked at the details of the ssm code, I saw that onlines 146-151, the code is trying to establish the vectors for thenumber of trials ("ntotal") and the response ("y") based on the rows ofthe input matrix y rather than based on the columns. Thus, if I changethose lines and rerun the code above, I do in fact get an estimatedstate vector of length=20 (and the estimates seem reasonable). Mymodified version of your code is attached to this msg (ssm2.r).

I also have an additional question that may or may not be related tothis. If I fit a random-walk model using the approach outlined above andmy ssm2 function (assuming it is correct), the point estimates of thestate vector are nearly identical to the "observed" data. However, if Ifirst do a logit transform of the proportions and then use ssm (or myversion) with gaussian errors, the point estimates of the state vectorare not nearly as close as those obtained with the binomial errorstructure (see my attached code "ssm_bin_ex.r"). I can understand whythe variance of the estimates would differ, but why the means? If I fitsome dummy models for non-time series models with glm, the estimates ofthe means using the 2 approaches just outlined are nearly identical (seemy attached code "glm_bin_ex.r").

I would be sincerely grateful for any help or insights you could offerwith respect to my problems.


Best wishes,
Mark

_____________________________

Mark D. Scheuerell, Ph.D.
National Marine Fisheries Service
Northwest Fisheries Science Center
2725 Montlake Blvd E
Seattle, WA 98112
USA

206.302.2437 tel
206.860.3400 fax

http://www.nwfsc.noaa.gov/mark.scheuerell

# this script uses a modification of the function "ssm" from the package "sspir"
# to compare differences in estimates of states in models with similar form, but
# different calls with respect to the glm-esque interface

library(sspir)
library(boot)
source("ssm2")

# generate some data
# 2 cols are "successes" and "failures"
y <- ts(cbind(rpois(20,20), rpois(20,100)), start=1, freq=1)

#------------------------------
# use ssm with binomial errors
#------------------------------

# set the ts model via ssm
ssm.ex1 <- ssm2(y ~ tvar(1), family=binomial(link="logit"), 
m0=c(y[1,1]/(y[1,1]+y[1,2])), fit=TRUE)
# get the model fit
ssm.ex1.fit <- getFit(ssm.ex1)

# this bit uses another approach, but returns same results
# set the ts model via ssm
#ssm.ex1 <- ssm2(y ~ tvar(1), family=binomial(link="logit"), 
m0=c(y[1,1]/(y[1,1]+y[1,2])), fit=FALSE)
# get the model fit
#ssm.ex1.fit <- extended(ssm.ex1$ss)
#ssm.ex1.fit <- getFit(ssm.ex1.fit)

# back-transform the estimates
ssm.ex1.pre <- inv.logit(ssm.ex1.fit$m)
ssm.ex1.up <- inv.logit(ssm.ex1.fit$m-unlist(ssm.ex1.fit$C))
ssm.ex1.lo <- inv.logit(ssm.ex1.fit$m+unlist(ssm.ex1.fit$C))


#------------------------------
# use ssm with gaussian errors
#------------------------------

# ts of the logit(p)
y2 <- ts(logit(y[,1]/(y[,1]+y[,2])), start=1, freq=1)

# set the ts model via ssm
ssm.ex2 <- ssm2(y2 ~ tvar(1), family=gaussian, m0=y2[1], fit=TRUE)
# get the model fit
ssm.ex2.fit <- getFit(ssm.ex2)

# back-transform the estimates
ssm.ex2.pre <- inv.logit(ssm.ex2.fit$m)
ssm.ex2.up <- inv.logit(ssm.ex2.fit$m-unlist(ssm.ex2.fit$C))
ssm.ex2.lo <- inv.logit(ssm.ex2.fit$m+unlist(ssm.ex2.fit$C))


#-----------------
# draw some plots
#-----------------

# plot the ts of logit(p) for the data
dev.new()
plot(y2, type="b", pch=16, ylab="logit(p)")
# plot the ts of estimated logit(p)
points(ssm.ex1.fit$m, type="b", pch=16, col="red")
points(ssm.ex2.fit$m, type="b", pch=16, col="blue")

# plot the ts of p for data (black) & model (blue)
dev.new()
par(mfrow = c(2, 1))
plot(y[,1]/(y[,1]+y[,2]), type="b", pch=16, ylim=c(0.1, 0.3), ylab="p")
# plot the estimated RW
points(ssm.ex1.pre, type="b", pch=16, col="blue")
points(ssm.ex1.up, type="l", pch=16, col="red")
points(ssm.ex1.lo, type="l", pch=16, col="red")

plot(y[,1]/(y[,1]+y[,2]), type="b", pch=16, ylim=c(0.1, 0.3), ylab="p")
# plot the estimated RW
points(ssm.ex2.pre, type="b", pch=16, col="blue")
points(ssm.ex2.up, type="l", pch=16, col="red")
points(ssm.ex2.lo, type="l", pch=16, col="red")

library(boot)

y <- cbind(rpois(20,20), rpois(20,100))
p <- y[,1]/(y[,1]+y[,2])

fit.1 <- glm(y ~ 1, family=binomial)
fit.1.pre <- predict(fit.1, se=TRUE, type="response")
fit.1.se <- fit.1.pre$se.fit
fitted(fit.1)

fit.2 <- glm(p ~ 1, family=binomial, weights=(y[,1]+y[,2]))
fit.2.pre <- predict(fit.2, se=TRUE, type="response")
fit.2.se <- fit.2.pre$se.fit
fitted(fit.2)

fit.3 <- glm(logit(p) ~ 1, family=gaussian)
fit.3.pre <- predict(fit.3, se=TRUE, type="response")
fit.3.se <- inv.logit(fit.3.pre$se.fit)
inv.logit(fitted(fit.3))

# this function is a copy of the "ssm" function from the package "sspir"
# with the exception that lines 
ssm2 <- function (formula, family = gaussian, data = list(), subset = NULL, 
    fit = TRUE, phi = NULL, m0 = NULL, C0 = NULL, Fmat = NULL, 
    Gmat = NULL, Vmat = NULL, Wmat = NULL) 
{
    noTvar <- function(e) {
        if (is.call(e)) 
            if (e[[1]] == as.name("tvar")) 
                return(e[[2]])
            else {
                for (i in 2:length(e)) e[[i]] <- noTvar(e[[i]])
                return(e)
            }
        else e
    }
    removeSpace <- function(string) gsub("[[:space:]]", "", string)
    makeSS <- function(x, y, tvar, tvarNames, ...) {
        dt = rep(1, nrow(x))
        p <- ncol(x)
        ntvar <- length(unique(tvar[tvar != 0]))
        phi <- c(1, rep(1, ntvar))
        names(phi) <- c("epsilon", tvarNames)
        res <- SS(y = ts(matrix(y, ncol = 1), start = start(y), 
            end = end(y), frequency = frequency(y)), x = list(x = x, 
            dt = dt, tvar = tvar), phi = phi, Fmat = function(tt, 
            x, phi) {
            x$x[tt, ]
        }, Gmat = function(tt, x, phi) {
            diag(p)
        }, Vmat = function(tt, x, phi) {
            matrix(phi["epsilon"], 1, 1)
        }, Wmat = function(tt, x, phi) {
            W <- matrix(0, p, p)
            diag(W)[tvar != 0] <- phi[-1][tvar]
            return(W)
        }, C0 = diag(p) * 1e+06, m0 = matrix(0, 1, p))
        return(res)
    }
    cl <- match.call()
    if (is.character(family)) 
        family <- get(family, mode = "function", envir = parent.frame())
    if (is.function(family)) 
        family <- family()
    if (is.null(family$family)) {
        print(family)
        stop("`family' not recognized")
    }
    if (missing(data)) 
        data <- environment(formula)
    mf <- match.call(expand.dots = FALSE)
    formula.notvar <- noTvar(formula)
    mf$formula <- formula.notvar
    mf$family <- mf$start <- NULL
    mf$drop.unused.levels <- TRUE
    mf$na.action <- "na.pass"
    mf$fit <- NULL
    mf$phi <- NULL
    mf$m0 <- NULL
    mf$C0 <- NULL
    mf$Fmat <- NULL
    mf$Gmat <- NULL
    mf$Wmat <- NULL
    mf$Vmat <- NULL
    mf[[1]] <- as.name("model.frame")
    y <- eval(mf[[2]][[2]], data)
    time <- time(ts(start = start(y), end = end(y), frequency = frequency(y)))
    assign("time", time, env = .GlobalEnv)
    mf <- eval(mf, .GlobalEnv)
    mt <- attr(mf, "terms")
    xvars <- as.character(attr(mt, "variables"))[-1]
    if ((yvar <- attr(mt, "response")) > 0) 
        xvars <- xvars[-yvar]
    xlev <- if (length(xvars) > 0) {
        xlev <- lapply(mf[xvars], levels)
        xlev[!sapply(xlev, is.null)]
    }
    y <- model.response(mf, "numeric")
    x <- if (!is.empty.model(mt)) 
        model.matrix(mt, mf)
    else matrix(, NROW(y), 0)
    term.labels <- unname(sapply(attr(terms(formula.notvar), 
        "term.labels"), removeSpace))
    pAssign <- attr(x, "assign")
    if (attr(terms(formula), "intercept")) {
        pAssign <- pAssign + 1
        pAssign[1] <- 1
        term.labels <- c("(Intercept)", term.labels)
    }
    tvar.terms <- terms(formula, specials = "tvar", keep.order = TRUE)
    idx <- attr(tvar.terms, "specials")$tvar
    if (attr(tvar.terms, "response")) 
        idx <- idx - 1
    if (attr(tvar.terms, "intercept")) 
        if (!any(attr(tvar.terms, "term.labels") == "tvar(1)")) 
            idx <- idx + 1
    tvar <- attr(terms(formula.notvar, keep.order = TRUE), "term.labels")
    if (attr(tvar.terms, "intercept")) 
        tvar <- c("(Intercept)", tvar)
    tvar <- tvar[idx]
    tvar <- unname(sapply(tvar, removeSpace))
    tvarAssign <- match(pAssign, sort(match(tvar, term.labels)))
    tvarAssign[is.na(tvarAssign)] <- 0
    if (any(unlist(lapply(term.labels, substr, 1, 8)) == "polytime")) {
        d <- sum(unlist(lapply(colnames(x), substr, 1, 8)) == 
            "polytime")
        nm <- unlist(lapply(colnames(x), substr, 1, 8))
        polytime.idx <- (1:ncol(x))[nm == "polytime"]
        Pmat <- function(deg) {
            res <- matrix(0, deg + 1, deg + 1)
            diag(res) <- 1
            for (i in 1:deg) {
                res[1:i, i + 1] <- 1/factorial(i:1)
            }
            res
        }
        if (any(unlist(lapply(tvar, substr, 1, 8)) == "polytime")) {
            polnum <- pmatch("polytime", tvar)
            nytvarAssign <- tvarAssign
            nytvarAssign[tvarAssign > polnum] <- tvarAssign[tvarAssign > 
                polnum] + d - 1
            nytvarAssign[tvarAssign == polnum] <- polnum:(polnum + 
                d - 1)
            tvarAssign <- nytvarAssign
            if (polnum > 1) 
                nytvar <- tvar[1:(polnum - 1)]
            else nytvar <- c()
            if (polnum < length(tvar)) 
                nytvar <- c(nytvar, colnames(x)[polytime.idx], 
                  tvar[(polnum + 1):length(tvar)])
            tvar <- nytvar
        }
    }
    if (any(unlist(lapply(term.labels, substr, 1, 9)) == "sumseason")) {
        period <- 1 + sum(unlist(lapply(colnames(x), substr, 
            1, 9)) == "sumseason")
        Cmat <- function(ncol) rbind(-1, cbind(diag(ncol - 1), 
            0))
        nm <- unlist(lapply(colnames(x), substr, 1, 9))
        sumseason.idx <- (1:ncol(x))[nm == "sumseason"]
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = numeric(0), residuals = y, fitted.values = 0 * 
            y, rank = 0, df.residual = length(y))
    }
    else {
#-------------------------
# begin code modification
#-------------------------
# original code in ssm:
#        if (family$family == "binomial") {
#            ntotal <- y[1, ] + y[2, ]
#            y <- y[1, ]
#        }
#        else {
#            ntotal <- NA
#
# revised code to work on cols of a matrix rather than rows:
        if (family$family == "binomial") {
            ntotal <- y[,1] + y[,2]
            y <- y[,1]
        }
        else {
            ntotal <- NA
        }
#-----------------------
# end code modification
#-----------------------
        z <- list()
        z$ss <- makeSS(x, y, w, tvar = tvarAssign, tvarNames = tvar)
        if (any(unlist(lapply(term.labels, substr, 1, 9)) == 
            "sumseason")) {
            oldG <- z$ss$Gmat
            newG <- function(tt, x, phi) {
                Gmat <- oldG(tt, x, phi)
                Gmat[sumseason.idx, sumseason.idx] <- Cmat(period - 
                  1)
                return(Gmat)
            }
            z$ss$Gmat <- newG
            if (any(unlist(lapply(tvar, substr, 1, 9)) == "sumseason")) {
                oldW <- z$ss$Wmat
                newW <- function(tt, x, phi) {
                  Wmat <- oldW(tt, x, phi)
                  diag(Wmat)[sumseason.idx[-1]] <- 0
                  return(Wmat)
                }
                z$ss$Wmat <- newW
            }
        }
        if (any(unlist(lapply(term.labels, substr, 1, 8)) == 
            "polytime")) {
            oldG2 <- z$ss$Gmat
            newG2 <- function(tt, x, phi) {
                Gmat <- oldG2(tt, x, phi)
                Gmat[polytime.idx, polytime.idx] <- Pmat(d - 
                  1)
                return(Gmat)
            }
            z$ss$Gmat <- newG2
        }
    }
    class(z) <- c("ssm", "lm")
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- xlev
    z$call <- cl
    z$terms <- mt
    z$data <- data
    z$ss$family <- family
    z$ss$ntotal <- ntotal
    if (!is.null(phi)) 
        z$ss$phi[1:length(phi)] <- phi
    if (!is.null(m0)) 
        z$ss$m0 <- m0
    if (!is.null(C0)) 
        z$ss$C0 <- C0
    if (!is.null(Fmat)) 
        z$ss$Fmat <- Fmat
    if (!is.null(Gmat)) 
        z$ss$Gmat <- Gmat
    if (!is.null(Vmat)) 
        z$ss$Vmat <- Vmat
    if (!is.null(Wmat)) 
        z$ss$Wmat <- Wmat
    if (fit) 
        z$ss <- kfs(z)
    z
}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] possible bug in "sspir" package?

Reply via email to