[R] MITOOLS: Error in eval(expr, envir, enclos) : invalid 'envir' argument
R-users & helpers: I am using Amelia, mitools and cmprsk to fit cumulative incidence curves to multiply imputed datasets. The error message that I get "Error in eval(expr, envir, enclos) : invalid 'envir' argument" occurs when I try to fit models to the 50 imputed datasets using the "with.imputationList" function of mitools. The problem seems to occur intermittently, depending on the type of model that I try to fit to the datasets as well as the previous code that has been executed during the R session. I have read the previous postings for similar problems and have tried renaming many of my objects which has not solved the problem. What is weird is that I have not been able to reproduce the problem using other standard survival datasets (like pbc). It therefore seems to have something to do with my particular analysis, likely the names of my objects. I cannot find the source of the problem and would greatly appreciate any help. Brant Below is my session information and some code demonstrating the issue occuring with coxph. > sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United attached base packages: [1] "splines" "grid" "stats" "graphics" "grDevices" "utils" [7] "datasets" "methods" "base" other attached packages: cmprsk mitools Amelia survivalRGraphics latticeExtra "2.1-7""1.0" "1.1-23" "2.31" "1.0-6" "0.2-1" lattice foreign MASS "0.15-8" "0.8-20" "7.2-34" > str(utt.mi)# My dataset 'data.frame': 168 obs. of 25 variables: $ age : num 79.5 67.1 63.7 76.9 69.0 ... $ gender : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 2 2 2 ... $ symptoms : Factor w/ 2 levels "0","1": 1 2 1 1 2 1 2 1 1 2 ... $ site : Factor w/ 3 levels "1","2","3": 1 1 2 1 2 1 1 2 1 3 ... $ multifoc : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ... $ ctnm : Factor w/ 2 levels "1","2": 1 NA 2 1 2 2 1 NA 1 2 ... $ prebca : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 2 1 1 1 ... $ precystec: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... $ surgery : Factor w/ 2 levels "1","2": 1 1 2 1 2 1 1 1 1 1 ... $ ptnm.t : Factor w/ 5 levels "0","1","2","3",..: 3 3 5 1 2 4 2 1 1 5 ... $ grade: Factor w/ 3 levels "1","2","3": 2 2 3 2 2 3 2 1 1 3 ... $ histol : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ... $ postbca : Factor w/ 2 levels "0","1": 2 1 2 2 2 NA 2 1 1 1 ... $ postcyst : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ... $ chemo: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 2 1 1 2 ... $ mets : Factor w/ 2 levels "0","1": 1 2 2 1 2 2 2 1 1 2 ... $ status : Factor w/ 4 levels "1","2","3","4": 1 3 2 1 3 3 3 1 1 3 ... $ futime : num 10.46 1.15 2.43 2.83 6.82 ... $ smk : Factor w/ 2 levels "0","1": 2 2 2 1 2 1 2 1 2 2 ... $ surg.yr : int 88 94 92 93 86 85 95 98 91 85 ... $ nodes: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 1 1 2 ... $ os : num 0 1 0 0 1 1 1 0 0 1 ... $ css : num 0 1 0 0 1 1 1 0 0 1 ... $ rfs : num 0 1 1 0 1 1 1 0 0 1 ... $ comp : num 0 1 1 0 1 1 1 0 0 1 ... > set.seed(200) > M <- 50 # Number of imputations > am.imp <- amelia(utt.mi, m=M, p2s=1, startvals=1, write.out=F, + idvars=c('os','css','rfs','comp'), + noms=c('gender','symptoms','site','multifoc','ctnm','prebca','precystec' , + 'smk','surgery','ptnm.t','nodes','grade','histol','postbca', + 'postcyst','chemo','mets','status'), + sqrts=c('futime')) -- Imputation 1 -- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 -- Imputation 50 -- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 > MIset <- imputationList(am.imp[1:M]) > mifit <- with(MIset, + coxph(Surv(futime, os) ~ age + symptoms + ctnm + smk)) Error in eval(expr, envir, enclos) : invalid 'envir' argument __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using MIcombine for coxph fits
R-helpers: I am using R 2.5 on Windows XP, packages all up to date. I have run into an issue with the MIcombine function of the mitools package that I hoped some of you might be able to help with. I will work through a reproducible example to demonstrate the issue. First, make a dataset from the pbc dataset in the survival package --- # Make a dataset library(survival) d <- pbc[,c('time','status','age','sex','hepmeg','platelet', 'trt', 'trig')] d[d==-9] <- NA d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor) str(d) summary(d) --- Second, since there is missing data for several (but not all) of the variables, investigate the patterns. --- library(Hmisc) na.pattern(d) clus <- naclus(d, method='complete') par(mfrow=c(2,2)) naplot(clus, which='all') print(clus) detach(package: Hmisc) --- After examining the missing data patterns, impute 10 datasets using the amelia function from the Amelia package. Check the densities of the continuous variables to make sure they make sense. --- library(Amelia) am.imp <- amelia(d, m=10, p2s=1, startvals=1, write.out=F, noms=c('status','sex','hepmeg','trt')) compare.density(data=d, output=am.imp, var='trig') compare.density(data=d, output=am.imp, var='platelet') --- Since everything looks ok, fit Cox models to each of the 10 imputed datasets using the functions of the mitools package. --- library(mitools) miset <- imputationList(list(am.imp[[1]],am.imp[[2]],am.imp[[3]], am.imp[[4]],am.imp[[5]],am.imp[[6]],am.imp[[7]],am.imp[[8]], am.imp[[9]],am.imp[[10]])) mifit <- with(miset, coxph(Surv(time, status) ~ age + sex + hepmeg + platelet + trt + trig)) mifit --- The "mifit" object shows the individual coxph fits for each imputed dataset. Now we have to pool these fitted models to obtain an overall model. The code and result are shown below. --- fit.am <- MIcombine(mifit) summary(fit.am) Multiple imputation results: with.imputationList(miset, coxph(Surv(time, status) ~ age + sex + hepmeg + platelet + trt + trig)) MIcombine.default(mifit) results se (lower upper) missInfo age 0.035548792 0.0082506946 0.019373545 0.0517240397 4 % sex1 -0.070760613 0.2563372831 -0.580309741 0.4387885156 34 % hepmeg1 0.932378808 0.2026274576 0.532555416 1.3322021993 23 % platelet -0.001757899 0.0009480636 -0.003620446 0.0001046469 14 % trt2 0.137413162 0.1924230007 -0.243815288 0.5186416117 29 % trig 0.003979287 0.0012010053 0.001607266 0.0063513078 25 % --- The question I have concerns the meaning of this result. The missInfo column of the summary function suggests that age was missing data, when in fact it was not. Is there a different interpretation for the missInfo column? Thanks in advance for any help. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MICE for Cox model
Thanks Adai, I got it to work. You were right, I had called the wrong pool function. Brant -Original Message- From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] Sent: Thursday, May 17, 2007 1:56 PM To: Inman, Brant A. M.D. Cc: r-help@stat.math.ethz.ch Subject: Re: [R] MICE for Cox model Are you sure you used my pool function? Because as I just have discovered, it had a minor typo in the code. After replacing " df <- (m - 1) * (1 + 1/r)2" with "df <- (m - 1) * (1 + 1/r)^2" in my pool() function, I get library(survival); data(pbc) d <- pbc[,c('time', 'status', 'age', 'sex', 'hepmeg', 'platelet', 'trt', 'trig')] d[d==-9] <- NA d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor) library(mice) imp <- mice(d, m=10, maxit=10, diagnostics=T, seed=500, defaultImputationMethod=c('norm', 'logreg', 'polyreg')) fit <- coxph.mids( Surv(time,status) ~ age + sex + hepmeg + platelet + trt + trig, imp) pool(fit) Call: pool(object = fit) Pooled coefficients: age sex1 hepmeg1 platelet trt2 trig 0.034924182 -0.208897827 0.987641362 -0.001559426 0.070124108 0.004122421 Fraction of information about the coefficients missing due to nonresponse: age sex1hepmeg1 platelet trt2 trig 0.06624167 0.19490517 0.27300965 0.21950332 0.27768153 0.40658964 Regards, Adai Inman, Brant A. M.D. wrote: > Adai, > > Thanks for the functions. I tried using your functions and I get the > same error message during the pooling part: > >> pool(micefit) > Error in names(df) <- names(f) <- names : 'names' attribute [5] must be > the same length as the vector [0] > > Brant > -Original Message- > From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] > Sent: Thursday, May 17, 2007 4:56 AM > To: Inman, Brant A. M.D. > Cc: r-help@stat.math.ethz.ch > Subject: Re: [R] MICE for Cox model > > I encountered this problem about 18 months ago. I contacted Prof. Fox > and Dr. Malewski (the R package maintainers for mice) but they referred > me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to > find his reply (if he did reply). > > Here are the functions I used at that time, if you want to take it with > lots of salt. Let me know if you find anything fishy with it. > > > coxph.mids <- function (formula, data, ...) { > >call <- match.call() >if (!is.mids(data)) stop("The data must have class mids") > >analyses <- as.list(1:data$m) > >for (i in 1:data$m) { > data.i<- complete(data, i) > analyses[[i]] <- coxph(formula, data = data.i, ...) >} > >object <- list(call = call, call1 = data$call, > nmis = data$nmis, analyses = analyses) > >oldClass(object) <- if (.SV4.) "mira" else c("mira", "coxph") >return(object) > } > > > And in the function 'pool', the small sample adjustment requires > residual degrees of freedom (i.e. dfc). For a cox model, I believe that > this is simply the number of events minus the regression coefficients. > There is support for this from middle of page 149 of the book by Parmer > & Machin (ISBN 0471936405). Please correct me if I am wrong. > > Here is the slightly modified version of pool : > > > pool <- function (object, method = "smallsample") { > >call <- match.call() >if (!is.mira(object)) stop("The object must have class 'mira'") > >if ((m <- length(object$analyses)) < 2) > stop("At least two imputations are needed for pooling.\n") > >analyses <- object$analyses > >k <- length(coef(analyses[[1]])) >names <- names(coef(analyses[[1]])) >qhat <- matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names)) >u <- array(NA, dim = c(m, k, k), > dimnames = list(1:m, names, names)) > >for (i in 1:m) { > fit <- analyses[[i]] > qhat[i, ] <- coef(fit) > u[i, , ] <- vcov(fit) >} > >qbar <- apply(qhat, 2, mean) >ubar <- apply(u, c(2, 3), mean) >e <- qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE) >b <- (t(e) %*% e)/(m - 1) >t <- ubar + (1 + 1/m) * b >r <- (1 + 1/m) * diag(b/ubar) >f <- (1 + 1/m) * diag(b/t) >df <- (m - 1) * (1 + 1/r)2 > >if (method == "smallsample") { > > if( any( class(fit) == &q
Re: [R] MICE for Cox model
Adai, Thanks for the functions. I tried using your functions and I get the same error message during the pooling part: > pool(micefit) Error in names(df) <- names(f) <- names : 'names' attribute [5] must be the same length as the vector [0] > Brant -Original Message- From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] Sent: Thursday, May 17, 2007 4:56 AM To: Inman, Brant A. M.D. Cc: r-help@stat.math.ethz.ch Subject: Re: [R] MICE for Cox model I encountered this problem about 18 months ago. I contacted Prof. Fox and Dr. Malewski (the R package maintainers for mice) but they referred me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to find his reply (if he did reply). Here are the functions I used at that time, if you want to take it with lots of salt. Let me know if you find anything fishy with it. coxph.mids <- function (formula, data, ...) { call <- match.call() if (!is.mids(data)) stop("The data must have class mids") analyses <- as.list(1:data$m) for (i in 1:data$m) { data.i<- complete(data, i) analyses[[i]] <- coxph(formula, data = data.i, ...) } object <- list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) <- if (.SV4.) "mira" else c("mira", "coxph") return(object) } And in the function 'pool', the small sample adjustment requires residual degrees of freedom (i.e. dfc). For a cox model, I believe that this is simply the number of events minus the regression coefficients. There is support for this from middle of page 149 of the book by Parmer & Machin (ISBN 0471936405). Please correct me if I am wrong. Here is the slightly modified version of pool : pool <- function (object, method = "smallsample") { call <- match.call() if (!is.mira(object)) stop("The object must have class 'mira'") if ((m <- length(object$analyses)) < 2) stop("At least two imputations are needed for pooling.\n") analyses <- object$analyses k <- length(coef(analyses[[1]])) names <- names(coef(analyses[[1]])) qhat <- matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names)) u <- array(NA, dim = c(m, k, k), dimnames = list(1:m, names, names)) for (i in 1:m) { fit <- analyses[[i]] qhat[i, ] <- coef(fit) u[i, , ] <- vcov(fit) } qbar <- apply(qhat, 2, mean) ubar <- apply(u, c(2, 3), mean) e <- qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE) b <- (t(e) %*% e)/(m - 1) t <- ubar + (1 + 1/m) * b r <- (1 + 1/m) * diag(b/ubar) f <- (1 + 1/m) * diag(b/t) df <- (m - 1) * (1 + 1/r)2 if (method == "smallsample") { if( any( class(fit) == "coxph" ) ){ ### this loop is the hack for survival analysis ### status <- fit$y[ , 2] n.events <- sum(status == max(status)) p<- length( coefficients( fit ) ) dfc <- n.events - p } else { dfc <- fit$df.residual } df <- dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df) } names(r) <- names(df) <- names(f) <- names fit <- list(call = call, call1 = object$call, call2 = object$call1, nmis = object$nmis, m = m, qhat = qhat, u = u, qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df, f = f) oldClass(fit) <- if (.SV4.) "mipo" else c("mipo", oldClass(object)) return(fit) } print.miro only gives the coefficients. Often I need the standard errors as well since I want to test if each regression coefficient from multiple imputation is zero or not. Since the function summary.mipo does not exist, can I suggest the following summary.mipo <- function(object){ if (!is.null(object$call1)){ cat("Call: ") dput(object$call1) } est <- object$qbar se <- sqrt(diag(object$t)) tval <- est/se df <- object$df pval <- 2 * pt(abs(tval), df, lower.tail = FALSE) coefmat <- cbind(est, se, tval, pval) colnames(coefmat) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|)") cat("\nCoefficients:\n") printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T ) cat("\nFraction of information about the coefficients missing due to nonresponse:", "\n") print(object$f) ans <- list( coefficients=coefmat, df=df, call=object$call1, fracinfo.miss=object$f ) invisible( ans ) } Hope this helps. Regards, Adai Inman, Brant A. M.D. wrote: > R-helpers: > > I have a dataset that has
[R] MICE for Cox model
R-helpers: I have a dataset that has 168 subjects and 12 variables. Some of the variables have missing data and I want to use the multiple imputation capabilities of the "mice" package to address the missing data. Given that mice only supports linear models and generalized linear models (via the lm.mids and glm.mids functions) and that I need to fit Cox models, I followed the previous suggestion of John Fox and have created my own function "cox.mids" to use coxph to fit models to the imputed datasets. (http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html) The function I created is: cox.mids <- function (formula, data, ...) { call <- match.call() if (!is.mids(data)) stop("The data must have class mids") analyses <- as.list(1:data$m) for (i in 1:data$m) { data.i <- complete(data, i) analyses[[i]] <- coxph(formula, data = data.i, ...) } object <- list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) oldClass(object) <- c("mira", "coxph") return(object) } The problem that I encounter occurs when I try to use the "pool" function to pool the fitted models into one general model. Here is some code that reproduces the error using the pbc dataset. d <- pbc[,c('time','status','age','sex','hepmeg','platelet', 'trt', 'trig')] d[d==-9] <- NA d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor) str(d) imp <- mice(d, m=10, maxit=10, diagnostics=T, seed=500, defaultImputationMethod=c('norm', 'logreg', 'polyreg')) fit <- cox.mids(Surv(time,status) ~ age + sex + hepmeg + platelet + trt + trig, imp) pool(fit) I have looked at the "pool" function but cannot figure out what I have done wrong. Would really appreciate any help with this. Thanks, Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ? R 2.5.0 alpha bug
This email is intended to highlight 2 problems that I encountered running R 2.5.0 alpha on a Windows XP machine. #1 - Open script error If I click the "Open folder" icon on the toolbar, R opens my script files perfectly. However, when I select "File > Open Script > MyFileLocation", I get a fatal error that causes R to close immediately. This error was reproduced on 3 consecutive occasions but has been intermittent thereafter. One of these fatal errors resulted in a typical error reporting box being generated which I sent off. I was not able to verify if this error has been reported and corrected in subsequent versions of 2.5. #2 - Bug reporting link on CRAN website broken I tried to report the bug listed above on the CRAN website but when I clicked on the bug reporting link on the left-hand side panel of the main site (http://bugs.r-project.org/cgi-bin/R) , I get an error page with the following message: The system encountered a fatal error cannot open config file /home/sfe/r-bugs/jitterbug/R : No such file or directory The last error code was: No such file or directory uid/gid=30/8 This has been submitted to r-devel. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prelim.norm() function not working
Thank you very much, that was indeed the problem. (And now that I read more carefully the help page, it did in fact say that the input was a data matrix and not a data frame.) Brant -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, April 25, 2007 12:12 AM To: Brant Inman Cc: r-help@stat.math.ethz.ch Subject: Re: [R] prelim.norm() function not working Looks like you have a data frame where you need a matrix. (The same issue occurs in most of Joe Schafer's packages, e.g. mix.) Try as.matrix(usnews). On Tue, 24 Apr 2007, Brant Inman wrote: > R-experts: > I am trying to reproduce some of Paul Allison's results in his little > green book on missing data (Sage 2002). The dataset for which I am > having problems, "usnews", can be found at: > http://www.ats.ucla.edu/stat/books/md/default.htm. I am working on a > Windows machine with R 2.5 installed, all packages up-to-date. > The problem has to do with the prelim.norm() function of the package > "norm". Specifically, I need to use this pre-processing function to > later use the EM algorithm and DA procedures in the norm package. I > am getting an error with the following code. > -- >> pre <- prelim.norm(usnews) > > Error in as.double.default(list(csat = c(972L, 961L, NA, 881L, NA, NA, : >(list) object cannot be coerced to 'double' > > - > I have read the previous postings and I am wondering if the problem > with prelim.norm is the size of the usnews dataset or the amount of > missing data. > > > >> dim(usnews) > [1] 13027 > > > > > Does anyone have any ideas? If not, are there alternatives to norm > for implementing the MLE and EM methods of dealing with missing data? > > Thanks, > > Brant Inman > Mayo Clinic > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Opinion on R plots: connecting X and Y
Attention R users, especially those that are experienced enough to be opinionated, I need your input. Consider the following simple plot: x <- rnorm(100) y <- rnorm(100) plot(x, y, bty='n') A colleague (and dreaded SAS user) commented that she thought that my plots could be "cleaned up" by connecting the X and Y axes. I know that I can do that with bty='l' but I don't want to, I find that the plots look less cluttered with disjoint axes. However, I was intrigued enough by her comments that I decided to solicit the opinions of others on this issue. Are there principled reasons why one should prefer joined axes or disjoint axes? Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using power.t.test over a range of conditions
Chuck Cleland and Steve Weigand both pointed out my mistake in the loop...trying to assign a list (i.e. the output from power.t.test) to a cell in a data.frame. Thanks guys. Brant __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using power.t.test over a range of conditions
R-Helpers: I would like to perform sample size calculations for an experiment. As part of this process, I would like to know how various assumptions affect the sample size calculation. For instance, one thing that I would like to know is how the calculated sample size changes as I vary the difference that I would like to detect. I tried the following first, but got the associated error message. - > power.t.test(delta=seq(500,2000,100), sd=1000, sig.level=0.05, power=0.8, + type='two.sample', alt='two.sided') Error in uniroot(function(n) eval(p.body) - power, c(2, 1e+07)) : invalid function value in 'zeroin' In addition: Warning message: the condition has length > 1 and only the first element will be used in: if (f(lower, ...) * f(upper, ...) >= 0) stop("f() values at end points not of opposite sign") > - >From the error message I suspected that the function did not handle vectors as arguments. I therefore tried the following looping structure to solve the problem: - DELTA <- seq(500,2000,250) SD <- seq(1000,2500,250) result <- matrix(nrow=length(DELTA), ncol=length(SD)) colnames(result) <- paste('SD=',SD, sep='') rownames(result) <- paste('Delta=',DELTA, sep='') for(i in 1:length(DELTA)){ for(j in 1:length(SD)){ result[i,j] <- power.t.test(delta=DELTA[i], sd=SD[j], sig.level=0.05, power=0.8, type='two.sample', alt='two.sided') } } Error in result[i, j] <- power.t.test(delta = DELTA[i], sd = SD[j], sig.level = 0.05, : number of items to replace is not a multiple of replacement length - Can some one tell me what I am doing wrong here? Thanks in advance for your help, Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Putting 2 breaks on Y axis
R plotting experts: I have a bivariate dataset composed of 300 (x,y) continuous datapoints. 297 of these points are located within the y range of [0,10], while 2 are located at 20 and one at 55. No coding errors, real outliers. When plotting these data with a scatterplot, I obviously have a problem. If I plot the full dataset with ylim = c(0,55), then I cannot see the structure in the data in the [0, 10] range. If I truncate the y axis with ylim = c(0,10), then I cannot see the 3 outliers. If I break the y axis from 10 to 20 (using plotrix functions), I still do not see the data optimally because of the white space from y=20 to y=55. What I would like to do is break the y axis at 2 points, roughly 10-20 and 20-55. Is there a function that can break an axis in 2 places? Thanks in advance for any suggestions. Brant __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Freeman-Tukey arcsine transformation
The point of the given transformation is not so much for normality as it is for variance stabilization. The variance of the Freeman-Tukey transform depends only on the denominator of the proportion in question...something that can be used to advantage. Brant -Original Message- From: Peter Dalgaard [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 13, 2007 3:36 PM To: Bos, Roger Cc: Inman, Brant A. M.D.; r-help@stat.math.ethz.ch Subject: Re: [R] Freeman-Tukey arcsine transformation Peter Dalgaard wrote: > Bos, Roger wrote: > >> I'm curious what this transformation does, but I am not curious enough to pay $14 to find out. Someone once told me that the arcsine was a good way to transform data and make it more 'normal'. I am wondering if this is an improved method. Anyone know of a free reference? >> >> >> >> > Well, a paper copy of the American Statistician (1978) would be free in > some sense > > In the meantime I got detached from JSTOR (i.e., I went home), and I'm > not prepared to jump through the relevant hoops for remote access at > this point, but AFAIR it was a relatively trivial version of the simple > arcsine transform, something like replacing asin(r/n) with the average > of asin(r/(n+1)) and asin((r+1)/(n+1)). The point of the paper is that > you can invert explicitly for r/n if n is known. > > Well, except for a couple of sqrt() it seems >> /The American Statistician/, Vol. 32, No. 4. (Nov., 1978), >> p. 138. >> >> >> Stable URL: >> http://links.jstor.org/sici?sici=0003-1305%28197811%2932%3A4%3C138%3ATIO TFD%3E2.0.CO%3B2-Z >> >> >> > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Freeman-Tukey arcsine transformation
R-Experts: Does anyone know if there are R functions to perform the Freeman-Tukey double arcsine transformation and then backtransform it? Thanks, Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed effects multinomial regression and meta-analysis
R-Experts: I just realized that the example I used in my previous posting today is incorrect because it is a binary response, not a multilevel response (small, medium, large) such as my real life problem has. I apologize for the confusion. The example is incorrect, but the multinomial problem is real. Brant [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mixed effects multinomial regression and meta-analysis
R Experts: I am conducting a meta-analysis where the effect measures to be pooled are simple proportions. For example, consider this data from Fleiss/Levin/Paik's Statistical methods for rates and proportions (2003, p189) on smokers: Study N Event P(Event) 1 86 830.965 2 93 900.968 3 136 1290.949 4 82 700.854 Total397 372 A test of heterogeneity for a table like this could simply be Pearson' chi-square test. -- smoke.data <- matrix(c(83,90,129,70,3,3,7,12), ncol=2, byrow=F) chisq.test(smoke.data, correct=T) > X-squared = 12.6004, df = 3, p-value = 0.005585 -- Now this test implies that the data is heterogenous and that pooling might be inappropriate. This type of analysis could be considered a fixed effects analysis because it assumes that the 4 studies are all coming from one underlying population. But what if I wanted to do a mixed effects (fixed + random) analysis of data like this, possibly adjusting for an important covariate or two (assuming I had more studies, of course)...how would I go about doing it? One thought that I had would be to use a mixed effects multinomial logistic regression model, such as that reported by Hedeker (Stat Med 2003, 22: 1433), though I don't know if (or where) it is implemented in R. I am certain there are also other ways... So, my questions to the R experts are: 1) What method would you use to estimate or account for the between study variance in a dataset like the one above that would also allow you to adjust for a variable that might explain the heterogeneity? 2) Is it implemented in R? Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sampling distribution of the range statistic
R-helpers: In the construction of control charts for statistical quality control objectives, one might choose to estimate the control limits for the mean using the mean range of the samples. This requires multiplying the mean range by a correction factor, often called "d2", that is tabulated in many books. The origin of "d2" seems to be table 2 of the following paper: Harter, HL. Ann Math Stat (1960) 31: 1122. In Table 2 of this paper, the mean of the sampling distribution of the range statistic is calculated for various sample sizes. Does anyone know whether R has a function that can reproduce the sampling distribution of the range represented in the Harter paper so that I don't have to look at tables in old books? Brant __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making TIFF images with rtiff
Several R Helpers pointed out that I forgot to include the dev.off() statement in my code. This solved my problem with one caviat: the output file address cannot have any spaces in it (as pointed out by Chuck Cleland). For instance: # This file location works great bitmap(file='C:\\temp\\test.tif', type = "tifflzw", res = 1200) plot(speed ~ dist) dev.off() # This file location does not work, despite being accurate bitmap(file='C:\\Documents and Settings\\m007704\\Desktop\\test.tif', type = "tifflzw", res = 1200) plot(speed ~ dist) dev.off() For the benefit of those that may need to make TIFF files in the future and don't know how to do it, I will recapitulate below the steps required to produce a TIFF file using R on a Windows XP machine. 1) Download and install a current version of Ghostscript. You probably need GPL(GNU General Public License) version of Ghostscript. For Windows, the correct file to download is called: gs854w32-gpl.exe. To download this file, go to one of the following websites: http://www.cs.wisc.edu/~ghost/ http://sourceforge.net/projects/ghostscript/ 2) Add Ghostscript to a Windows environmental variable. Right click on the My Computer icon on your desktop. Select: Properties > Advanced > Environmental Variables. You will see 2 boxes, one for user variables and one for system variables. In the user variables section, highlight the variable called "PATH" and then click edit. Click on the variable value box and go to the end of whatever is written there (don't erase it). Enter the following after the last bit of text: ";C:\Program Files\gs\gs8.54\bin\". This is the location on you computer where it can find the gswin32c.exe file that it needs to start Ghostscript. 3) Use the bitmap function to produce a TIFF Now you should be ready to make a TIFF. The following code is a simple example that you can use to see if everything is set up right on your PC. attach(cars) bitmap(file='C:\\temp\\test.tif', type = "tifflzw", res = 1200) plot(speed ~ dist) dev.off() This should produce a TIFF file called 'test.tif' in the 'temp' directory of your PC. If you do not have a directory of this name, substitute for one that exists (or create one). Note that the file argument does not seem to handle and spaces in the directory address, so select an address without spaces in it. Note also that, as pointed out by Ripley in this thread, there are different TIFF formats which can be made with R. My understanding is that the different formats have to do with different image compression algorithms, but you can read more about these details (and clues to why TIFF files seem to be prefered by some publishers and imaging software makers) at: http://en.wikipedia.org/wiki/Comparison_of_graphics_file_formats You can also type ?bitmap to see what R can output for you and read more about the TIFF file format at: http://en.wikipedia.org/wiki/Tiff I regret that I cannot comment on Unix or Mac computers as it has been nearly 15 years since I have used these types of machines and I therefore have no knowledge whatsoever that might be of use for users of these systems. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making TIFF images with rtiff
Thank you Peter Dalgaard. When I open a DOS box and type gswin32c, I do indeed get an error message saying that it can't find the program. I edited the Windows system environmental variable "Path" and the user environmental variable "PATH" (wasn't sure which to edit), to contain the follwing after a semicolon "C:\Program Files\gs\gs8.54\bin\". This effectively fixed the Dos box problem. I now get a GS prompt when I type gswin32c. When I restart R and use the following code, I no longer get an error message. > attach(cars) > bitmap(file='C:\\Documents and Settings\\m007704\\Desktop\\test.tif', + type = "tifflzw", res = 1200) > plot(speed ~ dist) Alas, if it was only that easy! When I look on my desktop (to which the file address above correctly refers to), there is no image file of any sort to be found. Any ideas as to what I am doing wrong? Brant Inman - It needs to be in your Path. If you open up a "DOS box" and type gswin32c, I bet you get the same error message. You can fix this by editing the Path (via My Computer/Properties/Advanced/Environment variables, as you seem to know). If you use the R_GSCMD route, you may get in trouble with the embedded space in "Program Files" ("dir/x c:" will tell you the equivalent space-free name). Also, remember that environment changes do not affect running programs so you may need to exit R and restart. -- O__ Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making TIFF images with rtiff
Thanks to all the responders. Here are some replies to the comments: 1) Concerning the term TIFF "format". It may be that the journals are misusing the term TIFF, but it would also appear that wikipedia is as well. The first sentence in the wiki link sent below states: "Tagged Image File FORMAT (abbreviated TIFF) is a file FORMAT for mainly storing images, including photographs and line art." Either way, the semantics of the word TIFF are probably not that important for the current query. If a publisher wants images in TIFF, I would like to provide them in that format, regardless of whether or not I deem the request proper. After all they are the publishing experts! 2) Converting PNGs to TIFFs with Photoshop. This is principally what I have done in the past that has given the poor results that I have noted. I thought that it could be something that I was specifically doing wrong so I consulted the medical imaging and design department of my institution (Mayo Clinic) which informed me that there is often a loss of information, some times quite large, in these types of image format conversions. They suggested that it is best to work with the TIFFs from the start if possible, which is why I am trying to explore this option in R. It is interesting that my imaging department was able to convert the WMF format to TIFF with much better success. However, since Photoshop does not support WMFs, I am unable to do this myself. I have downloaded ImageMagick and will try that. 3)Lack of gratitude by R users. It is interesting to note upon reviewing the R-help files that many queries (perhaps even the majority?) do, in fact, convey gratitude. Unfortunately, I have also noted that there are several messages from R developers that appear to feel underappreciated. I suspect that one reason that R is experiencing an explosion of users is precisely that people appreciate and value the donation of free time provided by statistical experts--such as Harrell, Weigand, Ripley and Kort in this thread--for the development of accurate and powerful statistical software. Furthermore, the support provided for the software in the form of R-help is outstanding, again something that I think is part of the package deal that is attracting new users to R. In other words, one should not assume a general lack of gratitude on behalf of R-help users but should see the growth of R as evidence that the software and its developpers/supporters are indeed greatly valued. I do not think that R would be used much if people did not appreciate the nice packages, functions and help provided. Indeed, those of us that have access to multiple software programs (I have access to JMP, SPSS, SPLUS and SAS) choose R as our primary method of analysis because we feel that the sharing environment supported by CRAN is a better way of doing statistical computing. Enough said. 4)Flexible journal policies. Of 4 papers that I have submitted in the last 3 months for publication in 3 journals (all to cancer related journals), all were subjected to online file checkers that forced me to upload TIFF files instead of PDFs. Not only do they check the format, but also various other resolution related items. In other words, I would not have even made it past the online submission stage if I only PDFs to work with. 5)Using the bitmap function to make TIFFs. This sounds like a very attractive option. I have tried this option using the simple code below: - attach(cars) plot(speed ~ dist) # Simple plot to test bitmap(file='C:\\...\\test.tif',type = "tifflzw", res = 1200) Error in system(paste(gsexe, "-help"), intern = TRUE, invisible = TRUE) : gswin32c.exe not found - Despite what the error message suggests, I do have a functional Ghostscript 8.54 program installed on my Windows XP machine with the executable found in the directory: C:\Program Files\gs\gs8.54\bin\gswin32c. I am not sure why R is not finding the program. I tried making a Windows environmental variable, R_GSCMD, with this system address but that did not have any success. Does the gswin32c file need to be in my R PATH? Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Making TIFF images with rtiff
Many medical journals and publishers require that images, whether photographs or line art, be submitted as high resolution .TIFF images. One option for R users is to produce an image in one format and to convert it to a .TIFF file using a second software program. My experience has been that this option often results in images of poorer quality, often with blurry contours, and a loss of resolution. A second and better option would be to make .TIFF files directly from the graphic output of R. I recently noticed that there is a library called "rtiff" that may be able to do this. However, I have not been able to get it to work, principally because I do not know how to install the required supporting software, libtiff and tiffio.h, correctly on my computer. I am running R 2.4.0 on a Windows XP machine. So far I have done the following: 1) Loaded the rtiff library 2) Downloaded and installed the TIFF library 3.8.2 (complete package and sources) from the following website: http://gnuwin32.sourceforge.net/packages/tiff.htm I would like to ask the R experts for help with the following things: 1) Where do I get the tiffio.h file? 2) Where do I install or relocate the tiffio.h and TIFF library to so that rtiff will work? Thanks for your help. Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Partial proportional odds logistic regression
Just a follow-up note on my last posting. I still have not had any replies from the R-experts our there that use partial proportional odds regression (and I have to hope that there are some of you!) but I do think that I have figured out how to perform the unconstrained partial proportional odds model using vglm. I show this code below for the benefit of others that may want to try it (or point out my errors) using one of the datasets in Petersen and Harrell's paper (Appl Stat 1990). However, I remain open for suggestions on how to implement the unconstrained partial proportional odds model. -- library(VGAM) library(MASS) library(Design) ### # Nausea dataset # Peterson and Harrell. Applied Statistics 1990, 39(2): 205-217 nausea.short <- data.frame(matrix(nrow=12, ncol=3)) #Table 2 colnames(nausea.short) <- c('nausea', 'cisplatin', 'freq') nausea.short[,1] <- ordered(rep(seq(0,5,1),2), labels=seq(0,5,1)) nausea.short[,2] <- c(rep(0,6), rep(1,6)) nausea.short[,3] <- c(43,39,13,22,15,29,7,7,3,12,15,14) # Proportional odds ordinal logistic regression: 3 options polr(nausea ~ cisplatin, weights=freq, data=nausea.short, method='logistic') lrm(nausea ~ cisplatin, weights=freq, data=nausea.short) vglm(nausea ~ cisplatin, weights=freq, data=nausea.short, family=cumulative(parallel=T, reverse=T)) # Unconstrained partial proportional odds ordinal logistic regression vglm(nausea ~ cisplatin, weights=freq, data=nausea.short, family=cumulative(parallel=F, reverse=T)) -- The results obtained with this approach appear consistent with those presented in Table 3 of the paper. However, the code for the unconstrained partial proportional odds model is so simple (just one letter is different than in the proportional odds model!) that I wonder if there is not room for error here that I am too inexperienced to identify. Again, help with the constrained model would be greatly appreciated. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Partial proportional odds logistic regression
R-experts: I would like to explore the partial proportional odds models of Peterson and Harrell (Applied Statistics 1990, 39(2): 205-217) for a dataset that I am analyzing. I have not been able to locate a R package that implements these models. Is anyone aware of existing R functions, packages, etc... that might be used to implement the partial proportional odds models? Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using VGAM's vglm function for ordinal logistic regression
Thank you for the help. Indeed, the differences in the results that I noted were due to the incorrect ordering of the response variable that resulted from my unattentive use of as.ordered on a character vector. Brant -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Sunday, January 07, 2007 2:52 AM To: Inman, Brant A. M.D. Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] Using VGAM's vglm function for ordinal logistic regression On Sat, 6 Jan 2007, Inman, Brant A. M.D. wrote: > > R-Experts: > > I am using the vglm function of the VGAM library to perform proportional > odds ordinal logistic regression. The issue that I would like help with > concerns the format in which the response variable must be provided for > this function to work correctly. > pneumo2$severity [1] normal normal normal normal normal normal normal normal mild mild [11] mild mild mild mild mild mild severe severe severe severe [21] severe severe severe severe Levels: mild < normal < severe is different from the ordering in the first example. The difference between vglm and polr is that the latter uses the conventional sign for the coefficients: see MASS4 p.204. I would never use as.ordered on a character vector, as this leaves it to R to choose the ordering. (Even if you think you intended alphabetical order, that depends on the locale: see the warnings on the help page.) > Consider the following example: > > -- > > library(VGAM) > library(MASS) > > attach(pneumo) > pneumo# Inspect the format of the original dataset > > # Restructure the pneumo dataset into a different format > pneumo2 <- data.frame(matrix(ncol=3, nrow=24)) > colnames(pneumo2) <- c('exposure.time', 'severity', 'freq') > pneumo2[,1] <- rep(pneumo[,1],3) > pneumo2[,2] <- > as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8))) > pneumo2[1:8,3] <- pneumo[,2] > pneumo2[9:16,3] <- pneumo[,3] > pneumo2[17:24,3] <- pneumo[,4] > pneumo2 # Inspect the format of the new modified dataset > > -- > > The problem occurs when I try to analyze these two datasets, which are > identical in content, with the proportional odds model using vglm: > > -- > > # Analyze the original dataset with vglm, get one set of results > >> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time), > data=pneumo, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) > 9.676093 10.581725 -2.596807 > > Degrees of Freedom: 16 Total; 13 Residual > Residual Deviance: 5.026826 > Log-likelihood: -204.2742 > > # Analyzing the modified dataset with vglm gives another set of results > >> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, > + family=cumulative(parallel=T)) > > Coefficients: > (Intercept):1 (Intercept):2 log(exposure.time) >-1.6306499 2.5630170 -0.1937956 > > Degrees of Freedom: 48 Total; 45 Residual > Residual Deviance: 503.7251 > Log-likelihood: -251.8626 > Warning messages: > 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = > trace, wzeps = control$wzepsilon) > 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = > trace, wzeps = control$wzepsilon) > 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = > trace, wzeps = control$wzepsilon) > 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = > trace, wzeps = control$wzepsilon) > 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = > trace, wzeps = control$wzepsilon) > > # Analyzing the modified dataset with polr reproduces these second > results > >> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2) > > Coefficients: > log(exposure.time) > 0.1938154 > > Intercepts: > mild|normal normal|severe >-1.630612 2.563055 > > Residual Deviance: 503.7251 > AIC: 509.7251 > > -- > > Can someone explain what I am doing wrong when using vglm and polr with > the modified dataset? I do not understand why these two formulations > should give different results. > > Brant Inman > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford,
[R] Using VGAM's vglm function for ordinal logistic regression
R-Experts: I am using the vglm function of the VGAM library to perform proportional odds ordinal logistic regression. The issue that I would like help with concerns the format in which the response variable must be provided for this function to work correctly. Consider the following example: -- library(VGAM) library(MASS) attach(pneumo) pneumo # Inspect the format of the original dataset # Restructure the pneumo dataset into a different format pneumo2 <- data.frame(matrix(ncol=3, nrow=24)) colnames(pneumo2) <- c('exposure.time', 'severity', 'freq') pneumo2[,1] <- rep(pneumo[,1],3) pneumo2[,2] <- as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8))) pneumo2[1:8,3] <- pneumo[,2] pneumo2[9:16,3] <- pneumo[,3] pneumo2[17:24,3] <- pneumo[,4] pneumo2 # Inspect the format of the new modified dataset -- The problem occurs when I try to analyze these two datasets, which are identical in content, with the proportional odds model using vglm: -- # Analyze the original dataset with vglm, get one set of results > vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time), data=pneumo, + family=cumulative(parallel=T)) Coefficients: (Intercept):1 (Intercept):2 log(exposure.time) 9.676093 10.581725 -2.596807 Degrees of Freedom: 16 Total; 13 Residual Residual Deviance: 5.026826 Log-likelihood: -204.2742 # Analyzing the modified dataset with vglm gives another set of results > vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, + family=cumulative(parallel=T)) Coefficients: (Intercept):1 (Intercept):2 log(exposure.time) -1.6306499 2.5630170 -0.1937956 Degrees of Freedom: 48 Total; 45 Residual Residual Deviance: 503.7251 Log-likelihood: -251.8626 Warning messages: 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace = trace, wzeps = control$wzepsilon) # Analyzing the modified dataset with polr reproduces these second results > polr(severity ~ log(exposure.time), weights=freq, data=pneumo2) Coefficients: log(exposure.time) 0.1938154 Intercepts: mild|normal normal|severe -1.630612 2.563055 Residual Deviance: 503.7251 AIC: 509.7251 -- Can someone explain what I am doing wrong when using vglm and polr with the modified dataset? I do not understand why these two formulations should give different results. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calling "confint.glm" from within another function
On July 12, 2004 Spencer Graves wrote an email describing essentially the same issue that I would like help on: calling the confint function from within another homemade function. Because he provided many good examples of the problem, I will not reproduce them here but will instead refer readers to the original posting: http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg23826.html It is not clear from the replies to his posting how he managed to solve the problem, so I am reposting a similar note again in hopes of guidance. In particular, I would like to know if and how I need to modify the confint function call so that I can access it from within another function that I have written. Thanks, Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Profile confidence intervals and LR chi-square test
Thank you to Prof Ripley and Henric Nilsson for their observation that I was using "anova" inappropriately for the question that I was trying to answer. Note that the Wald statistics and confidence interval were calculable in the previous email but that I prefered using the more accurate deviance statistics. I will demonstrate my error for the benefit of those new users of R (like me) that may not have appreciated how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY actually means in this context. Since the "anova" function is a sequential test of the current model, only the statistic for the last term in the model formula is a true deviance chi-square statistic for (full model) .vs. (full model - term). For instance, using the data upon which this question was based, consider the following 2 models: -- > fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit), na.action=na.omit) > fit1 <- update(fit0, . ~ . + x1 + x2 + x3) -- Here, fit0 is the null (intercept-only) model and fit1 is the full model (without interactions because interactions are not biologically plausible in this medical dataset). Now notice the result of the "anova" function for the full model: -- > anova(fit1, test='Chisq') ... Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL99137.989 x118.26798129.721 0.004 x215.63997124.083 0.018 x313.43396120.650 0.064 --- It is incorrect to interpret the deviance chi-square test presented above for x1 (P=0.004) as the deviance chi-square statistic comparing (y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1) .vs. (y~x1). Similarly, the deviance chi-square statistic calculated for x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs. (y~x1+x2). Lastly, the deviance chi-square statistic for x3(P=0.064) is probably the most intuitive because it is for the comparison of (y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to present for x3 in the full model. So how do you get the correct deviance chi-square statistics for x1 and x2 in the full model? There are a couple of ways of arriving at the same answer as I will demonstrate for the case of x1. Option#1: Reorder the full model so that x1 is the last term in the model formula --- > fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit), na.action=na.omit) > anova(fit2, test='Chisq') ... Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL99137.989 x217.30598130.683 0.007 x313.82197126.862 0.051 x116.21296120.650 0.013 --- Notice that the deviance chi-square statistics for all of the variables have changed, despite fit2 being identical in content to fit1. Just the order of the terms in the model formula have changed from fit1 to fit2. The deviance statistic for x1 is now the correct one for the full model, that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3). Option#2: Compare the full model to a reduced model that does not include x1. --- > fit3 <- update(fit1, . ~ . - x1) > anova(fit1, fit3, test='Chisq') ... Model 1: y ~ x1 + x2 + x3 Model 2: y ~ x2 + x3 Resid. Df Resid. Dev Df Deviance P(>|Chi|) 196120.650 297126.862 -1 -6.212 0.013 --- Fit3 is the same model as fit1 except that it is missing the x1 term. Therefore, any change in the deviance chi-square statistic is due to the deletion of x1 from full model. Notice that the difference in residual deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same the difference b/t x1 and x3 in option#1. Brant __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Profile confidence intervals and LR chi-square test
System: R 2.3.1 on Windows XP machine. I am building a logistic regression model for a sample of 100 cases in dataframe "d", in which there are 3 binary covariates: x1, x2 and x3. > summary(d) y x1 x2 x3 0:54 0:50 0:64 0:78 1:46 1:50 1:36 1:22 > fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit)) > summary(fit) Call: glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), data = d) Deviance Residuals: Min 1Q Median 3Q Max -1.6503 -1.0220 -0.7284 0.9965 1.7069 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.3772 0.3721 -1.014 0.3107 x11 -0.8144 0.4422 -1.842 0.0655 . x21 0.9226 0.4609 2.002 0.0453 * x31 1.3347 0.5576 2.394 0.0167 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 137.99 on 99 degrees of freedom Residual deviance: 120.65 on 96 degrees of freedom AIC: 128.65 Number of Fisher Scoring iterations: 4 > exp(fit$coef) (Intercept) x11 x21 x31 0.6858006 0.4429233 2.5157321 3.7989873 --- After reading the appropriate sections in MASS4 (7.2 and 8.4 in particular), I decided to estimate the 95% confidence intervals for the odds ratios using the profile method implemented in the "confint" function. I then used the "anova" function to perform the deviance chi-square tests for each covariate. --- > ci <- confint(fit); exp(ci) Waiting for profiling to be done... 2.5 %97.5 % (Intercept) 0.3246680 1.413684 x11 0.1834819 1.048154 x21 1.0256096 6.314473 x31 1.3221533 12.129210 > anova(fit, test='Chisq') Analysis of Deviance Table Model: binomial, link: logit Response: y Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL99137.989 x115.85698132.133 0.016 x215.27197126.862 0.022 x316.21296120.650 0.013 My question relates to the interpretation of the significance of variable x1. The OR for x1 is 0.443 and its profile confidence interval is 0.183-1.048. If a type I error rate of 5% is assumed, this result would tend to suggest that x1 is NOT a significant predictor of y. However, the deviance chi-square test has a P-value of 0.016, which suggests that x1 is indeed a significant predictor of y. How do I reconcile these two differing messages? I do recognize that the upper bound of the confidence interval is pretty close to 1, but I am certain that some journal reviewer will point out the problem as inconsistent. Brant Inman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cox model using mfp library
I am running R 2.3.1 on a Windows XP machine. I have a large dataset of over 13 000 cases of a disease for which I am attempting to build a prognostic model using Cox proportional hazards regression. Some of the continuous covariates are skewed and therefore require transformation for use in the Cox model. I have empirically selected some simple normalizing transformations (based on previous knowledge and the ladder of powers) and wanted to compare these selections to those produced by the fractional polynomial technique of Royston and Altman. When I attempt to fit a Cox model with the mfp library for overall survival using a single left-skewed covariate called "age", I get the following output: > library(mfp) > fit<- mfp(Surv(time.dead,dead.any)~ fp(age), family=cox, verbose=T, + na.action=na.exclude) VariableDeviancePower(s) Cycle 1 age 27071.72 26704.421 26701.832 26695.2 -2 -2 Tansformation shift scale age 0 100 Fractional polynomials df.initial select alpha df.final power1 power2 age 4 1 0.054 -2 -2 Null model: 27071.72 Linear model: 26704.42 Final model: 26695.2 Error in "[.data.frame"(list("Surv(time.dead, dead.any)" = c(NA, NA, NA, : invalid subscript type -- Can anyone tell me what this error message means and why I am getting it? I did not get this error when I tried similar code using the GBSG dataset provided with the mfp library. Brant Inman Mayo Clinic __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Import problem with S-Plus 7.0 dataset
Thanks to Brian Ripley and Richard Heiberger for the solution to my problem. I will spell it out below in steps so that (hopefully) nobody else will need to ask the same question in the future (if they search the R help archive). HOW TO GET S-PLUS 7.0 DATA into R __ STEP 0 (in S-Plus 7.0): Generation of a fake dataframe: > x <- c(1:10) > y <- rep(c(0,1),5) > mydata <- data.frame(cbind(x,y)) > mydata x y 1 1 0 2 2 1 3 3 0 4 4 1 5 5 0 6 6 1 7 7 0 8 8 1 9 9 0 10 10 1 STEP 1 (in S-Plus 7.0): Save the dataframe as earlier versions of S-Plus did (note: I called the output file "ddump.sdd" instead of "mydata.sdd" to demonstrate a point later on) >data.dump('mydata', file='C:\\temp\\ddump.sdd', oldStyle=T) STEP 2 (In R): Restore the saved dataframe >data.restore('C:\\temp\\ddump.sdd') [1] "C:\\temp\\ddump.sdd" STEP 3 (In R): View/use the first few cases of dataset) > head(mydata) x y 1 1 0 2 2 1 3 3 0 4 4 1 5 5 0 6 6 1 There are 2 important points to note: 1) The dataframe keeps the name it had in S-Plus, not the name of the file you saved it in. Using the example above, R never creates an object called "ddump": > data.restore('C:\\temp\\ddump.sdd') [1] "C:\\temp\\ddump.sdd" > ddump Error: object "ddump" not found 2) Saving the data.restore result into a new variable is not that useful because it will NOT contain your dataframe, only the name of the file that contained your dataframe: >test <- data.restore('C:\\temp\\ddump.sdd') >test [1] "C:\\temp\\ddump.sdd" Brant Inman -Original Message- From: Richard M. Heiberger [mailto:[EMAIL PROTECTED] Sent: Friday, November 03, 2006 3:07 PM To: Inman, Brant A. M.D.; Brian Ripley Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Import problem with S-Plus 7.0 dataset It looks like it works. The result you printed is the name of the file that data.store read. The name of the variable is the same as the name you called it in S-Plus. type data[1:5,] and you should see your data.frame. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Import problem with S-Plus 7.0 dataset
As recommended, I have tried the following code to solve my problem importing data to R from S-Plus 7.0. Unfortunately, I have not had success. In S-Plus: > data.dump('data', file='C:\\temp\\ddump.sdd', oldStyle=T) This resulted in the production of a file called "ddump.sdd" that I can import into S-Plus without any problem. However, when I use the following code in R ... In R: > d <- data.restore(file='C:\\temp\\ddump.sdd') > d [1] "C:\\temp\\ddump.sdd" > d <- data.restore('C:\\temp\\ddump.sdd') > d [1] "C:\\temp\\ddump.sdd" > I also tried the following: > d <- read.S(file='C:\\temp\\ddump.sdd') Error in read.S(file = "C:\\temp\\ddump.sdd") : not an S object I thought that S-Plus might be doing something funny when I save as a .sdd file, so I tried saving as "ddump.dat" but got exactly the same results. I have been able to save my original datafile as a csv and import it, but I had hoped that it would be possible to keep my variable formats by importing the S dataframe into R directly. Brant Inman [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Import problem with S-Plus 7.0 dataset
I am running R 2.3.1 on a Windows XP machine. I am trying to import some data into R that is stored as an S-Plus 7.0 .sdd file. When I run the following command, I get this error: > library(foreign) > d <- read.S(file='H:\\Research\\data.sdd') Error in read.S(file = "H:\\Research\\data.sdd") : not an S object The dataset is fairly large, roughly 13000 rows and 100 columns, but within S-Plus 7.0 it is stored as a normal dataframe (not a bd dataframe). Any ideas? Brant Inman [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence intervals for Brier score
I am using the ipred library to calculate the censored Brier score for a Cox proportional hazards model. I would like to know if anyone has developed a method of calculating confidence intervals for the various forms of the Brier score that are used in the analysis of survival/censored data. If so, are these implemented in S? Thank you Brant Inman [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing data analysis in R
I am looking for a book that discusses the theory of multiple imputation (and other methods of dealing with missing data) and, just as importantly, how to implement these methods in R or S-Plus. Ideally, the book would have a structure similar to Faraway (Regression), Pinheiro&Bates (Mixed Effects) and Wood (GAMs) and would be very modern (i.e. published within the last couple of years). Any ideas? If such a book does not exist, one of the experts on this help list should write it! (I will gladly buy the first copy.) Brant Inman Mayo Clinic [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating the probability of an event a timeoint "t" from a Cox model fit
I would like to determine the probability of an event at a specific timepoint given the linear predictor of a given Cox model. For instance, assume that I fit the following model: data(pbc) fit <- coxph(Surv(time, status)~ age, data=pbc) To extract the value of the linear predictor for each patient in the dataset: prd <- predict(fit, newdata=pbc, type="lp") However, what I am really interested in is the predicted probability from the Cox model that an individual will experience an event by some time t (say 2000 days in this example). To my knowledge, calculating this probability requires knowing one of the 3 baseline functions-the survivor function or the hazard function or the cumulative hazard function-estimated from the Cox model. I would like to know : 1) How do I extract the correct baseline hazard so that I can predict the individual probablility of failure at 2000 days, given a particular patient covariate pattern (in this case, a particular patient's age? Is the baseline hazard for the average patient covariates the ideal one? 2) Is there a better way of directly extracting these probabilities? Brant Inman [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Censored Brier Score and Royston/Sauerbrei's D
System: R 2.3.1 on a Windows XP computer. I am validating several cancer prognostic models that have been published with a large independent dataset. Some of the models report a probability of survival at a specified timepoint, usually at 5 and 10 years. Others report only the linear predictor of the Cox model. I have used Harrell's c index for censored data (rcorr.cens) as a measure of discrimination and have constructed smoothed calibration plots. I would like to include some measures of overall model performance, such as the censored Brier score and Royston/Sauerbrei's D statistic (Stat Med 2004). With this in mind, I have 3 questions: 1) Can the "sbrier" function of the "ipred" library be used to calculate the censored Brier score for a specific time point given known predictions for that timepoint? library(ipred) data <- read.csv(file='c:\\ time<- data$time# The time in years from diagnosis of cancer to death status <- data$status # The status at last follow-up: 1=dead, 0=alive pred<- data$pred# The predicted probability of surviving 5 years after cancer from external Cox model A linp <- data$linp # The linear predictor of external Cox model B predicting survival after cancer s <- Surv(time, status) test <- sbrier(s, pred, btime=5) # I get this error message that I can't seem to solve Error in Surv(time, 1 - cens) : Time variable is not numeric In addition: Warning message: is.na() applied to non-(list or vector) in: is.na(time) 2) Can the linear predictor (linp) be used in sbrier in the same way as a probability might? 3) Has anyone implemented Royston/Sauerbrei's D? Brant Inman Mayo Clinic [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.