[R] MITOOLS: Error in eval(expr, envir, enclos) : invalid 'envir' argument

2007-06-07 Thread Inman, Brant A. M.D.

R-users & helpers:

I am using Amelia, mitools and cmprsk to fit cumulative incidence curves
to multiply imputed datasets.  The error message that I get 

"Error in eval(expr, envir, enclos) : invalid 'envir' argument"

occurs when I try to fit models to the 50 imputed datasets using the
"with.imputationList" function of mitools.  The problem seems to occur
intermittently, depending on the type of model that I try to fit to the
datasets as well as the previous code that has been executed during the
R session.  I have read the previous postings for similar problems and
have tried renaming many of my objects which has not solved the problem.


What is weird is that I have not been able to reproduce the problem
using other standard survival datasets (like pbc). It therefore seems to
have something to do with my particular analysis, likely the names of my
objects.  I cannot find the source of the problem and would greatly
appreciate any help.

Brant

Below is my session information and some code demonstrating the issue
occuring with coxph.

> sessionInfo()
R version 2.5.0 (2007-04-23) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 

attached base packages:
[1] "splines"   "grid"  "stats" "graphics"  "grDevices" "utils"

[7] "datasets"  "methods"   "base" 

other attached packages:
  cmprsk  mitools   Amelia survivalRGraphics
latticeExtra 
 "2.1-7""1.0" "1.1-23"   "2.31"  "1.0-6"
"0.2-1" 
 lattice  foreign MASS 
"0.15-8" "0.8-20" "7.2-34" 


> str(utt.mi)# My dataset
'data.frame':   168 obs. of  25 variables:
 $ age  : num  79.5 67.1 63.7 76.9 69.0 ...
 $ gender   : Factor w/ 2 levels "0","1": 1 2 2 2 1 2 2 2 2 2 ...
 $ symptoms : Factor w/ 2 levels "0","1": 1 2 1 1 2 1 2 1 1 2 ...
 $ site : Factor w/ 3 levels "1","2","3": 1 1 2 1 2 1 1 2 1 3 ...
 $ multifoc : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
 $ ctnm : Factor w/ 2 levels "1","2": 1 NA 2 1 2 2 1 NA 1 2 ...
 $ prebca   : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 2 1 1 1 ...
 $ precystec: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ surgery  : Factor w/ 2 levels "1","2": 1 1 2 1 2 1 1 1 1 1 ...
 $ ptnm.t   : Factor w/ 5 levels "0","1","2","3",..: 3 3 5 1 2 4 2 1 1 5
...
 $ grade: Factor w/ 3 levels "1","2","3": 2 2 3 2 2 3 2 1 1 3 ...
 $ histol   : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
 $ postbca  : Factor w/ 2 levels "0","1": 2 1 2 2 2 NA 2 1 1 1 ...
 $ postcyst : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...
 $ chemo: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 2 1 1 2 ...
 $ mets : Factor w/ 2 levels "0","1": 1 2 2 1 2 2 2 1 1 2 ...
 $ status   : Factor w/ 4 levels "1","2","3","4": 1 3 2 1 3 3 3 1 1 3
...
 $ futime   : num  10.46  1.15  2.43  2.83  6.82 ...
 $ smk  : Factor w/ 2 levels "0","1": 2 2 2 1 2 1 2 1 2 2 ...
 $ surg.yr  : int  88 94 92 93 86 85 95 98 91 85 ...
 $ nodes: Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 1 1 2 ...
 $ os   : num  0 1 0 0 1 1 1 0 0 1 ...
 $ css  : num  0 1 0 0 1 1 1 0 0 1 ...
 $ rfs  : num  0 1 1 0 1 1 1 0 0 1 ...
 $ comp : num  0 1 1 0 1 1 1 0 0 1 ...

> set.seed(200)
> M <- 50   # Number of imputations
> am.imp <- amelia(utt.mi, m=M, p2s=1, startvals=1, write.out=F,
+ idvars=c('os','css','rfs','comp'), 
+
noms=c('gender','symptoms','site','multifoc','ctnm','prebca','precystec'
,
+ 'smk','surgery','ptnm.t','nodes','grade','histol','postbca',
+ 'postcyst','chemo','mets','status'),
+ sqrts=c('futime'))
-- Imputation 1 --

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 



-- Imputation 50 --

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 

> MIset <- imputationList(am.imp[1:M])
> mifit <- with(MIset, 
+ coxph(Surv(futime, os) ~ age + symptoms + ctnm + smk))

Error in eval(expr, envir, enclos) : invalid 'envir' argument

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using MIcombine for coxph fits

2007-05-31 Thread Inman, Brant A. M.D.

R-helpers:

I am using R 2.5 on Windows XP, packages all up to date.  I have run
into an issue with the MIcombine function of the mitools package that I
hoped some of you might be able to help with.  I will work through a
reproducible example to demonstrate the issue.

First, make a dataset from the pbc dataset in the survival package

---
# Make a dataset
library(survival)
d <- pbc[,c('time','status','age','sex','hepmeg','platelet', 
'trt', 'trig')]
d[d==-9] <- NA 
d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor)
str(d)
summary(d)
---

Second, since there is missing data for several (but not all) of the
variables, investigate the patterns.

---
library(Hmisc)
na.pattern(d)   
clus <- naclus(d, method='complete')
par(mfrow=c(2,2))
naplot(clus, which='all')
print(clus)
detach(package: Hmisc)
---

After examining the missing data patterns, impute 10 datasets using the
amelia function from the Amelia package. Check the densities of the
continuous variables to make sure they make sense. 

---
library(Amelia)
am.imp <- amelia(d, m=10, p2s=1, startvals=1, write.out=F,
noms=c('status','sex','hepmeg','trt'))
compare.density(data=d, output=am.imp, var='trig')
compare.density(data=d, output=am.imp, var='platelet')
---

Since everything looks ok, fit Cox models to each of the 10 imputed
datasets using the functions of the mitools package.

---
library(mitools)
miset  <- imputationList(list(am.imp[[1]],am.imp[[2]],am.imp[[3]],
am.imp[[4]],am.imp[[5]],am.imp[[6]],am.imp[[7]],am.imp[[8]],
am.imp[[9]],am.imp[[10]]))
mifit  <- with(miset, coxph(Surv(time, status) ~ age + sex + 
hepmeg + platelet + trt + trig))
mifit
---

The "mifit" object shows the individual coxph fits for each imputed
dataset.  Now we have to pool these fitted models  to obtain an overall
model. The code and result are shown below.

---
fit.am <- MIcombine(mifit)
summary(fit.am) 

Multiple imputation results:
  with.imputationList(miset, coxph(Surv(time, status) ~ age + sex + 
hepmeg + platelet + trt + trig))
  MIcombine.default(mifit)
  results   se   (lower   upper) missInfo
age   0.035548792 0.0082506946  0.019373545 0.0517240397  4 %
sex1 -0.070760613 0.2563372831 -0.580309741 0.4387885156 34 %
hepmeg1   0.932378808 0.2026274576  0.532555416 1.3322021993 23 %
platelet -0.001757899 0.0009480636 -0.003620446 0.0001046469 14 %
trt2  0.137413162 0.1924230007 -0.243815288 0.5186416117 29 %
trig  0.003979287 0.0012010053  0.001607266 0.0063513078 25 %
---

The question I have concerns the meaning of this result.  The missInfo
column of the summary function suggests that age was missing data, when
in fact it was not. Is there a different interpretation for the missInfo
column?

Thanks in advance for any help.

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MICE for Cox model

2007-05-17 Thread Inman, Brant A. M.D.

Thanks Adai, I got it to work.  You were right, I had called the wrong
pool function.

Brant

-Original Message-
From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 17, 2007 1:56 PM
To: Inman, Brant A. M.D.
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] MICE for Cox model

Are you sure you used my pool function? Because as I just have 
discovered, it had a minor typo in the code. After replacing "
df <- (m - 1) * (1 + 1/r)2" with "df <- (m - 1) * (1 + 1/r)^2" in my 
pool() function, I get


  library(survival); data(pbc)
  d <- pbc[,c('time', 'status', 'age', 'sex',
  'hepmeg', 'platelet', 'trt', 'trig')]
  d[d==-9] <- NA
  d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor)

  library(mice)
  imp <- mice(d, m=10, maxit=10, diagnostics=T, seed=500,
   defaultImputationMethod=c('norm', 'logreg', 'polyreg'))
  fit <- coxph.mids( Surv(time,status) ~ age + sex + hepmeg + platelet
  + trt + trig, imp)

  pool(fit)

Call: pool(object = fit)

Pooled coefficients:
  age sex1  hepmeg1 platelet trt2 
   trig
  0.034924182 -0.208897827  0.987641362 -0.001559426  0.070124108 
0.004122421

Fraction of information about the coefficients missing due to
nonresponse:
    age       sex1hepmeg1   platelet   trt2   trig
0.06624167 0.19490517 0.27300965 0.21950332 0.27768153 0.40658964

Regards, Adai



Inman, Brant A. M.D. wrote:
> Adai,
> 
> Thanks for the functions.  I tried using your functions and I get the
> same error message during the pooling part:
> 
>> pool(micefit)
> Error in names(df) <- names(f) <- names : 'names' attribute [5] must
be
> the same length as the vector [0]
> 
> Brant
> -Original Message-
> From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, May 17, 2007 4:56 AM
> To: Inman, Brant A. M.D.
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] MICE for Cox model
> 
> I encountered this problem about 18 months ago. I contacted Prof. Fox 
> and Dr. Malewski (the R package maintainers for mice) but they
referred 
> me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to 
> find his reply (if he did reply).
> 
> Here are the functions I used at that time, if you want to take it
with 
> lots of salt. Let me know if you find anything fishy with it.
> 
> 
> coxph.mids <- function (formula, data, ...) {
> 
>call <- match.call()
>if (!is.mids(data)) stop("The data must have class mids")
> 
>analyses <- as.list(1:data$m)
> 
>for (i in 1:data$m) {
>  data.i<- complete(data, i)
>  analyses[[i]] <- coxph(formula, data = data.i, ...)
>}
> 
>object <- list(call = call, call1 = data$call,
>   nmis = data$nmis, analyses = analyses)
> 
>oldClass(object) <- if (.SV4.) "mira" else c("mira", "coxph")
>return(object)
> }
> 
> 
> And in the function 'pool', the small sample adjustment requires 
> residual degrees of freedom (i.e. dfc). For a cox model, I believe
that 
> this is simply the number of events minus the regression coefficients.

> There is support for this from middle of page 149 of the book by
Parmer 
> & Machin (ISBN 0471936405). Please correct me if I am wrong.
> 
> Here is the slightly modified version of pool :
> 
> 
> pool <- function (object, method = "smallsample") {
> 
>call <- match.call()
>if (!is.mira(object)) stop("The object must have class 'mira'")
> 
>if ((m <- length(object$analyses)) < 2)
>  stop("At least two imputations are needed for pooling.\n")
> 
>analyses <- object$analyses
> 
>k <- length(coef(analyses[[1]]))
>names <- names(coef(analyses[[1]]))
>qhat  <- matrix(NA, nrow = m, ncol = k, dimnames = list(1:m,
names))
>u <- array(NA, dim = c(m, k, k),
>   dimnames = list(1:m, names, names))
> 
>for (i in 1:m) {
>  fit   <- analyses[[i]]
>  qhat[i, ] <- coef(fit)
>  u[i, , ]  <- vcov(fit)
>}
> 
>qbar <- apply(qhat, 2, mean)
>ubar <- apply(u, c(2, 3), mean)
>e <- qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
>b <- (t(e) %*% e)/(m - 1)
>t <- ubar + (1 + 1/m) * b
>r <- (1 + 1/m) * diag(b/ubar)
>f <- (1 + 1/m) * diag(b/t)
>df <- (m - 1) * (1 + 1/r)2
> 
>if (method == "smallsample") {
> 
>  if( any( class(fit) == &q

Re: [R] MICE for Cox model

2007-05-17 Thread Inman, Brant A. M.D.

Adai,

Thanks for the functions.  I tried using your functions and I get the
same error message during the pooling part:

> pool(micefit)
Error in names(df) <- names(f) <- names : 'names' attribute [5] must be
the same length as the vector [0]
>

Brant
-Original Message-
From: Adaikalavan Ramasamy [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 17, 2007 4:56 AM
To: Inman, Brant A. M.D.
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] MICE for Cox model

I encountered this problem about 18 months ago. I contacted Prof. Fox 
and Dr. Malewski (the R package maintainers for mice) but they referred 
me to Prof. van Buuren. I wrote to Prof. van Buuren but am unable to 
find his reply (if he did reply).

Here are the functions I used at that time, if you want to take it with 
lots of salt. Let me know if you find anything fishy with it.


coxph.mids <- function (formula, data, ...) {

   call <- match.call()
   if (!is.mids(data)) stop("The data must have class mids")

   analyses <- as.list(1:data$m)

   for (i in 1:data$m) {
 data.i<- complete(data, i)
 analyses[[i]] <- coxph(formula, data = data.i, ...)
   }

   object <- list(call = call, call1 = data$call,
  nmis = data$nmis, analyses = analyses)

   oldClass(object) <- if (.SV4.) "mira" else c("mira", "coxph")
   return(object)
}


And in the function 'pool', the small sample adjustment requires 
residual degrees of freedom (i.e. dfc). For a cox model, I believe that 
this is simply the number of events minus the regression coefficients. 
There is support for this from middle of page 149 of the book by Parmer 
& Machin (ISBN 0471936405). Please correct me if I am wrong.

Here is the slightly modified version of pool :


pool <- function (object, method = "smallsample") {

   call <- match.call()
   if (!is.mira(object)) stop("The object must have class 'mira'")

   if ((m <- length(object$analyses)) < 2)
 stop("At least two imputations are needed for pooling.\n")

   analyses <- object$analyses

   k <- length(coef(analyses[[1]]))
   names <- names(coef(analyses[[1]]))
   qhat  <- matrix(NA, nrow = m, ncol = k, dimnames = list(1:m, names))
   u <- array(NA, dim = c(m, k, k),
  dimnames = list(1:m, names, names))

   for (i in 1:m) {
 fit   <- analyses[[i]]
 qhat[i, ] <- coef(fit)
 u[i, , ]  <- vcov(fit)
   }

   qbar <- apply(qhat, 2, mean)
   ubar <- apply(u, c(2, 3), mean)
   e <- qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
   b <- (t(e) %*% e)/(m - 1)
   t <- ubar + (1 + 1/m) * b
   r <- (1 + 1/m) * diag(b/ubar)
   f <- (1 + 1/m) * diag(b/t)
   df <- (m - 1) * (1 + 1/r)2

   if (method == "smallsample") {

 if( any( class(fit) == "coxph" ) ){

   ### this loop is the hack for survival analysis ###

   status   <- fit$y[ , 2]
   n.events <- sum(status == max(status))
   p<- length( coefficients( fit )  )
   dfc  <- n.events - p

 } else {

   dfc <- fit$df.residual
 }

 df <- dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df)
   }

   names(r) <- names(df) <- names(f) <- names
   fit <- list(call = call, call1 = object$call, call2 = object$call1,
   nmis = object$nmis, m = m, qhat = qhat, u = u,
   qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df,
   f = f)
   oldClass(fit) <- if (.SV4.) "mipo" else c("mipo", oldClass(object))
   return(fit)
}


print.miro only gives the coefficients. Often I need the standard errors
as well since I want to test if each regression coefficient from
multiple imputation is zero or not. Since the function summary.mipo does
not exist, can I suggest the following


summary.mipo <- function(object){

if (!is.null(object$call1)){
  cat("Call: ")
  dput(object$call1)
}

est  <- object$qbar
se   <- sqrt(diag(object$t))
tval <- est/se
df   <- object$df
pval <- 2 * pt(abs(tval), df, lower.tail = FALSE)

coefmat <- cbind(est, se, tval, pval)
colnames(coefmat) <- c("Estimate", "Std. Error",
 "t value", "Pr(>|t|)")

cat("\nCoefficients:\n")
printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T )

cat("\nFraction of information about the coefficients
missing due to nonresponse:", "\n")
print(object$f)

ans <- list( coefficients=coefmat, df=df,
 call=object$call1, fracinfo.miss=object$f )
invisible( ans )

}


Hope this helps.

Regards, Adai



Inman, Brant A. M.D. wrote:
> R-helpers:
> 
> I have a dataset that has

[R] MICE for Cox model

2007-05-16 Thread Inman, Brant A. M.D.

R-helpers:

I have a dataset that has 168 subjects and 12 variables.  Some of the
variables have missing data and I want to use the multiple imputation
capabilities of the "mice" package to address the missing data. Given
that mice only supports linear models and generalized linear models (via
the lm.mids and glm.mids functions) and that I need to fit Cox models, I
followed the previous suggestion of John Fox and have created my own
function "cox.mids" to use coxph to fit models to the imputed datasets.

(http://tolstoy.newcastle.edu.au/R/help/06/03/22295.html)

The function I created is:



cox.mids <- function (formula, data, ...)
{
call <- match.call()
if (!is.mids(data)) 
stop("The data must have class mids")
analyses <- as.list(1:data$m)
for (i in 1:data$m) {
data.i <- complete(data, i)
analyses[[i]] <- coxph(formula, data = data.i, ...)
}
object <- list(call = call, call1 = data$call, nmis = data$nmis, 
analyses = analyses)
oldClass(object) <- c("mira", "coxph")
return(object)
}



The problem that I encounter occurs when I try to use the "pool"
function to pool the fitted models into one general model. Here is some
code that reproduces the error using the pbc dataset.



d <- pbc[,c('time','status','age','sex','hepmeg','platelet', 'trt',
'trig')]
d[d==-9] <- NA 
d[,c(4,5,7)] <- lapply(d[,c(4,5,7)], FUN=as.factor)
str(d)

imp <- mice(d, m=10, maxit=10, diagnostics=T, seed=500, 
defaultImputationMethod=c('norm', 'logreg', 'polyreg'))

fit <- cox.mids(Surv(time,status) ~ age + sex + hepmeg + platelet + trt
+   trig, imp)

pool(fit)



I have looked at the "pool" function but cannot figure out what I have
done wrong.  Would really appreciate any help with this.

Thanks,

Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ? R 2.5.0 alpha bug

2007-05-01 Thread Inman, Brant A. M.D.


This email is intended to highlight 2 problems that I encountered
running R 2.5.0 alpha on a Windows XP machine.

#1 - Open script error

If I click the "Open folder" icon on the toolbar, R opens my script
files perfectly.  However, when I select "File > Open Script >
MyFileLocation", I get a fatal error that causes R to close immediately.
This error was reproduced on 3 consecutive occasions but has been
intermittent thereafter. One of these fatal errors resulted in a typical
error reporting box being generated which I sent off.  I was not able to
verify if this error has been reported and corrected in subsequent
versions of 2.5.

#2 - Bug reporting link on CRAN website broken

I tried to report the bug listed above on the CRAN website but when I
clicked on the bug reporting link on the left-hand side panel of the
main site (http://bugs.r-project.org/cgi-bin/R) , I get an error page
with the following message:

The system encountered a fatal error 
cannot open config file /home/sfe/r-bugs/jitterbug/R : No such file or
directory
The last error code was: No such file or directory 
uid/gid=30/8 


This has been submitted to r-devel.

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prelim.norm() function not working

2007-04-25 Thread Inman, Brant A. M.D.

Thank you very much, that was indeed the problem. (And now that I read
more carefully the help page, it did in fact say that the input was a
data matrix and not a data frame.)

Brant
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Prof Brian Ripley
Sent: Wednesday, April 25, 2007 12:12 AM
To: Brant Inman
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] prelim.norm() function not working

Looks like you have a data frame where you need a matrix.  (The same 
issue occurs in most of Joe Schafer's packages, e.g. mix.)

Try as.matrix(usnews).

On Tue, 24 Apr 2007, Brant Inman wrote:

> R-experts:
> I am trying to reproduce some of Paul Allison's results in his little
> green book on missing data (Sage 2002).  The dataset for which I am
> having problems, "usnews", can be found at:
> http://www.ats.ucla.edu/stat/books/md/default.htm.  I am working on a
> Windows machine with R 2.5 installed, all packages up-to-date.
> The problem has to do with the prelim.norm() function of the package
> "norm".   Specifically, I need to use this pre-processing function to
> later use the EM algorithm and DA procedures in the norm package.  I
> am getting an error with the following code.
> --
>> pre <- prelim.norm(usnews)
>
> Error in as.double.default(list(csat = c(972L, 961L, NA, 881L, NA, NA,
:
>(list) object cannot be coerced to 'double'
>
> -
> I have read the previous postings and I am wondering if the problem
> with prelim.norm is the size of the usnews dataset or the amount of
> missing data.
>
> 
>
>> dim(usnews)
> [1] 13027
>
> 
>
>
> Does anyone have any ideas?  If not, are there alternatives to norm
> for implementing the MLE and EM methods of dealing with missing data?
>
> Thanks,
>
> Brant Inman
> Mayo Clinic
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Opinion on R plots: connecting X and Y

2007-04-20 Thread Inman, Brant A. M.D.

Attention R users, especially those that are experienced enough to be
opinionated, I need your input.

Consider the following simple plot:

x <- rnorm(100)
y <- rnorm(100)
plot(x, y, bty='n')

A colleague (and dreaded SAS user) commented that she thought that my
plots could be "cleaned up" by connecting the X and Y axes.  I know that
I can do that with bty='l' but I don't want to, I find that the plots
look less cluttered with disjoint axes.

However, I was intrigued enough by her comments that I decided to
solicit the opinions of others on this issue.  Are there principled
reasons why one should prefer joined axes or disjoint axes?

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using power.t.test over a range of conditions

2007-04-20 Thread Inman, Brant A. M.D.
 
Chuck Cleland and Steve Weigand both pointed out my mistake in the
loop...trying to assign a list (i.e. the output from power.t.test) to a
cell in a data.frame.

Thanks guys.


Brant

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using power.t.test over a range of conditions

2007-04-20 Thread Inman, Brant A. M.D.

R-Helpers:

I would like to perform sample size calculations for an experiment.  As
part of this process, I would like to know how various assumptions
affect the sample size calculation.  For instance, one thing that I
would like to know is how the calculated sample size changes as I vary
the difference that I would like to detect.  I tried the following
first, but got the associated error message.

-

> power.t.test(delta=seq(500,2000,100), sd=1000, sig.level=0.05,
power=0.8,
+ type='two.sample', alt='two.sided')

Error in uniroot(function(n) eval(p.body) - power, c(2, 1e+07)) : 
invalid function value in 'zeroin'
In addition: Warning message:
the condition has length > 1 and only the first element will be used in:
if (f(lower, ...) 
* f(upper, ...) >= 0) stop("f() values at end points not of opposite
sign") 
>

-

>From the error message I suspected that the function did not handle
vectors as arguments.  I therefore tried the following looping structure
to solve the problem:

-


DELTA  <- seq(500,2000,250)
SD   <- seq(1000,2500,250)
result <- matrix(nrow=length(DELTA), ncol=length(SD))
colnames(result) <- paste('SD=',SD, sep='')
rownames(result) <- paste('Delta=',DELTA, sep='')

for(i in 1:length(DELTA)){
for(j in 1:length(SD)){
result[i,j] <- power.t.test(delta=DELTA[i], sd=SD[j],
sig.level=0.05, power=0.8,
type='two.sample', alt='two.sided')
}
}

Error in result[i, j] <- power.t.test(delta = DELTA[i], sd = SD[j],
sig.level = 0.05,  : 
number of items to replace is not a multiple of replacement
length

-

Can some one tell me what I am doing wrong here?

Thanks in advance for your help,

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Putting 2 breaks on Y axis

2007-04-12 Thread Inman, Brant A. M.D.

R plotting experts:

I have a bivariate dataset composed of 300 (x,y) continuous datapoints.
297 of these points are located within the y range of [0,10], while 2
are located at 20 and one at 55.  No coding errors, real outliers.

When plotting these data with a scatterplot, I obviously have a problem.
If I plot the full dataset with ylim = c(0,55), then I cannot see the
structure in the data in the [0, 10] range.  If I truncate the y axis
with ylim = c(0,10), then I cannot see the 3 outliers.  If I break the y
axis from 10 to 20 (using plotrix functions), I still do not see the
data optimally because of the white space from y=20 to y=55.

What I would like to do is break the y axis at 2 points, roughly 10-20
and 20-55. Is there a function that can break an axis in 2 places?

Thanks in advance for any suggestions.

Brant

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Freeman-Tukey arcsine transformation

2007-03-13 Thread Inman, Brant A. M.D.

The point of the given transformation is not so much for normality as it
is for variance stabilization.  The variance of the Freeman-Tukey
transform depends only on the denominator of the proportion in
question...something that can be used to advantage. 


Brant

-Original Message-
From: Peter Dalgaard [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 13, 2007 3:36 PM
To: Bos, Roger
Cc: Inman, Brant A. M.D.; r-help@stat.math.ethz.ch
Subject: Re: [R] Freeman-Tukey arcsine transformation

Peter Dalgaard wrote:
> Bos, Roger wrote:
>   
>> I'm curious what this transformation does, but I am not curious
enough to pay $14 to find out.  Someone once told me that the arcsine
was a good way to transform data and make it more 'normal'.  I am
wondering if this is an improved method.  Anyone know of a free
reference?
>>
>>  
>>   
>> 
> Well, a paper copy of the American Statistician (1978) would be free
in 
> some sense
>
> In the meantime I got detached from JSTOR (i.e., I went home), and I'm

> not prepared to jump through the relevant hoops for remote access at 
> this point, but AFAIR it was a relatively trivial version of the
simple 
> arcsine transform, something like replacing asin(r/n) with the average

> of asin(r/(n+1)) and asin((r+1)/(n+1)). The point of the paper is that

> you can invert explicitly for r/n if n is known.
>
>   
Well, except for a couple of sqrt() it seems
>> /The American Statistician/, Vol. 32, No. 4. (Nov.,
1978),
>> p. 138. 
>>
>>
>> Stable URL:
>>
http://links.jstor.org/sici?sici=0003-1305%28197811%2932%3A4%3C138%3ATIO
TFD%3E2.0.CO%3B2-Z
>>
>>
>> 
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Freeman-Tukey arcsine transformation

2007-03-13 Thread Inman, Brant A. M.D.
R-Experts:

Does anyone know if there are R functions to perform the Freeman-Tukey
double arcsine transformation and then backtransform it?

Thanks,

Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixed effects multinomial regression and meta-analysis

2007-03-05 Thread Inman, Brant A. M.D.
 

R-Experts:

 

I just realized that the example I used in my previous posting today is
incorrect because it is a binary response, not a multilevel response
(small, medium, large) such as my real life problem has.  I apologize
for the confusion.  The example is incorrect, but the multinomial
problem is real.

 

Brant

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mixed effects multinomial regression and meta-analysis

2007-03-05 Thread Inman, Brant A. M.D.

R Experts:

I am conducting a meta-analysis where the effect measures to be pooled
are simple proportions.  For example, consider this  data from
Fleiss/Levin/Paik's Statistical methods for rates and proportions (2003,
p189) on smokers:

Study  N   Event P(Event)
 1   86   830.965
 2   93   900.968
 3   136 1290.949
 4   82   700.854
Total397 372

A test of heterogeneity for a table like this could simply be Pearson'
chi-square test.  
--

smoke.data <- matrix(c(83,90,129,70,3,3,7,12), ncol=2, byrow=F)
chisq.test(smoke.data, correct=T)

> X-squared = 12.6004, df = 3, p-value = 0.005585

--

Now this test implies that the data is heterogenous and that pooling
might be inappropriate. This type of analysis could be considered a
fixed effects analysis because it assumes that the 4 studies are all
coming from one underlying population.  But what if I wanted to do a
mixed effects (fixed + random) analysis of data like this, possibly
adjusting for an important covariate or two (assuming I had more
studies, of course)...how would I go about doing it? One thought that I
had would be to use a mixed effects multinomial logistic regression
model, such as that reported by Hedeker (Stat Med 2003, 22: 1433),
though I don't know if (or where) it is implemented in R.  I am certain
there are also other ways...

So, my questions to the R experts are:

1) What method would you use to estimate or account for the between
study variance in a dataset like the one above that would also allow you
to adjust for a variable that might explain the heterogeneity?

2) Is it implemented in R?


Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sampling distribution of the range statistic

2007-02-15 Thread Inman, Brant A. M.D.

R-helpers:

In the construction of control charts for statistical quality control
objectives, one might choose to estimate the control limits for the mean
using the mean range of the samples.  This requires multiplying the mean
range by a correction factor, often called "d2", that is tabulated in
many books. The origin of "d2" seems to be table 2 of the following
paper:

Harter, HL.  Ann Math Stat (1960) 31: 1122.

In Table 2 of this paper, the mean of the sampling distribution of the
range statistic is calculated for various sample sizes.  Does anyone
know whether R has a function that can reproduce the sampling
distribution of the range represented in the Harter paper so that I
don't have to look at tables in old books?

Brant

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making TIFF images with rtiff

2007-01-12 Thread Inman, Brant A. M.D.

Several R Helpers pointed out that I forgot to include the dev.off()
statement in my code.  This solved my problem with one caviat: the
output file address cannot have any spaces in it (as pointed out by
Chuck Cleland).  For instance:


# This file location works great
bitmap(file='C:\\temp\\test.tif',
type = "tifflzw", res = 1200)
plot(speed ~ dist)  
dev.off()

# This file location does not work, despite being accurate
bitmap(file='C:\\Documents and Settings\\m007704\\Desktop\\test.tif',
type = "tifflzw", res = 1200)
plot(speed ~ dist)  
dev.off() 


For the benefit of those that may need to make TIFF files in the future
and don't know how to do it, I will recapitulate below the steps
required to produce a TIFF file using R on a Windows XP machine.

1) Download and install a current version of Ghostscript. 

You probably need GPL(GNU General Public License) version of
Ghostscript. For Windows, the correct file to download is called:
gs854w32-gpl.exe. To download this file, go to one of the following
websites:

http://www.cs.wisc.edu/~ghost/
http://sourceforge.net/projects/ghostscript/

2) Add Ghostscript to a Windows environmental variable.

Right click on the My Computer icon on your desktop.  Select: Properties
> Advanced > Environmental Variables.  You will see 2 boxes, one for
user variables and one for system variables.  In the user variables
section, highlight the variable called "PATH" and then click edit. Click
on the variable value box and go to the end of whatever is written there
(don't erase it).  Enter the following after the last bit of text:
";C:\Program Files\gs\gs8.54\bin\".  This is the location on you
computer where it can find the gswin32c.exe file that it needs to start
Ghostscript.

3) Use the bitmap function to produce a TIFF

Now you should be ready to make a TIFF.  The following code is a simple
example that you can use to see if everything is set up right on your
PC.

attach(cars)
bitmap(file='C:\\temp\\test.tif',
type = "tifflzw", res = 1200)
plot(speed ~ dist)  
dev.off()

This should produce a TIFF file called 'test.tif' in the 'temp'
directory of your PC.  If you do not have a directory of this name,
substitute for one that exists (or create one).  Note that the file
argument does not seem to handle and spaces in the directory address, so
select an address without spaces in it. Note also that, as pointed out
by Ripley in this thread, there are different TIFF formats which can be
made with R.  My understanding is that the different formats have to do
with different image compression algorithms, but you can read more about
these details (and clues to why TIFF files seem to be prefered by some
publishers and imaging software makers) at:

http://en.wikipedia.org/wiki/Comparison_of_graphics_file_formats

You can also type ?bitmap to see what R can output for you and read more
about the TIFF file format at:

http://en.wikipedia.org/wiki/Tiff

I regret that I cannot comment on Unix or Mac computers as it has been
nearly 15 years since I have used these types of machines and I
therefore have no knowledge whatsoever that might be of use for users of
these systems.

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making TIFF images with rtiff

2007-01-12 Thread Inman, Brant A. M.D.

Thank you Peter Dalgaard.

When I open a DOS box and type gswin32c, I do indeed get an error message 
saying that it can't find the program.  I edited the Windows system 
environmental variable "Path" and the user environmental variable "PATH" 
(wasn't sure which to edit), to contain the follwing after a semicolon 
"C:\Program Files\gs\gs8.54\bin\". This effectively fixed the Dos box problem. 
I now get a GS prompt when I type gswin32c.

When I restart R and use the following code, I no longer get an error message.


> attach(cars)
> bitmap(file='C:\\Documents and Settings\\m007704\\Desktop\\test.tif',
+ type = "tifflzw", res = 1200)
> plot(speed ~ dist)


Alas, if it was only that easy!  When I look on my desktop (to which the file 
address above correctly refers to), there is no image file of any sort to be 
found. Any ideas as to what I am doing wrong?

Brant Inman


-

It needs to be in your Path. If you open up a "DOS box" and type
gswin32c, I bet you get the same error message. You can fix this by
editing the Path (via My Computer/Properties/Advanced/Environment
variables, as you seem to know). If you use the R_GSCMD route, you may
get in trouble with the embedded space in "Program Files" ("dir/x c:"
will tell you the equivalent space-free name). Also, remember that
environment changes do not affect running programs so you may need to
exit R and restart.

-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making TIFF images with rtiff

2007-01-12 Thread Inman, Brant A. M.D.

Thanks to all the responders.  Here are some replies to the comments:


1) Concerning the term TIFF "format".
It may be that the journals are misusing the term TIFF, but it would
also appear that wikipedia is as well. The first sentence in the wiki
link sent below states:

"Tagged Image File FORMAT (abbreviated TIFF) is a file FORMAT for mainly
storing images, including photographs and line art."

Either way, the semantics of the word TIFF are probably not that
important for the current query.  If a publisher wants images in TIFF, I
would like to provide them in that format, regardless of whether or not
I deem the request proper.  After all they are the publishing experts!

2) Converting PNGs to TIFFs with Photoshop.
This is principally what I have done in the past that has given the poor
results that I have noted. I thought that it could be something that I
was specifically doing wrong so I consulted the medical imaging and
design department of my institution (Mayo Clinic) which informed me that
there is often a loss of information, some times quite large, in these
types of image format conversions. They suggested that it is best to
work with the TIFFs from the start if possible, which is why I am trying
to explore this option in R.  It is interesting that my imaging
department was able to convert the WMF format to TIFF with much better
success.  However, since Photoshop does not support WMFs, I am unable to
do this myself.  I have downloaded ImageMagick and will try that.

3)Lack of gratitude by R users.
It is interesting to note upon reviewing the R-help files that many
queries (perhaps even the majority?) do, in fact, convey gratitude.
Unfortunately, I have also noted that there are several messages from R
developers that appear to feel underappreciated. I suspect that one
reason that R is experiencing an explosion of users is precisely that
people appreciate and value the donation of free time provided by
statistical experts--such as Harrell, Weigand, Ripley and Kort in this
thread--for the development of accurate and powerful statistical
software.  Furthermore, the support provided for the software in the
form of R-help is outstanding, again something that I think is part of
the package deal that is attracting new users to R.  In other words, one
should not assume a general lack of gratitude on behalf of R-help users
but should see the growth of R as evidence that the software and its
developpers/supporters are indeed greatly valued.  I do not think that R
would be used much if people did not appreciate the nice packages,
functions and help provided.  Indeed, those of us that have access to
multiple software programs (I have access to JMP, SPSS, SPLUS and SAS)
choose R as our primary method of analysis because we feel that the
sharing environment supported by CRAN is a better way of doing
statistical computing.  Enough said.

4)Flexible journal policies.
Of 4 papers that I have submitted in the last 3 months for publication
in 3 journals (all to cancer related journals), all were subjected to
online file checkers that forced me to upload TIFF files instead of
PDFs.  Not only do they check the format, but also various other
resolution related items. In other words, I would not have even made it
past the online submission stage if I only PDFs to work with. 

5)Using the bitmap function to make TIFFs.
This sounds like a very attractive option.  I have tried this option
using the simple code below:

-
attach(cars)
plot(speed ~ dist)  # Simple plot to test
bitmap(file='C:\\...\\test.tif',type = "tifflzw", res = 1200)

Error in system(paste(gsexe, "-help"), intern = TRUE, invisible = TRUE)
: 
gswin32c.exe not found

-

Despite what the error message suggests, I do have a functional
Ghostscript 8.54 program installed on my Windows XP machine with the
executable found in the directory: C:\Program
Files\gs\gs8.54\bin\gswin32c.  I am not sure why R is not finding the
program.  I tried making a Windows environmental variable, R_GSCMD, with
this system address but that did not have any success.  Does the
gswin32c file need to be in my R PATH?

Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Making TIFF images with rtiff

2007-01-11 Thread Inman, Brant A. M.D.
Many medical journals and publishers require that images, whether
photographs or line art, be submitted as high resolution .TIFF images.
One option for R users is to produce an image in one format and to
convert it to a .TIFF file using a second software program.  My
experience has been that this option often results in images of poorer
quality, often with blurry contours, and a loss of resolution.  A second
and better option would be to make .TIFF files directly from the graphic
output of R.  

I recently noticed that there is a library called "rtiff" that may be
able to do this.  However, I have not been able to get it to work,
principally because I do not know how to install the required supporting
software, libtiff and tiffio.h, correctly on my computer. I am running R
2.4.0 on a Windows XP machine.  So far I have done the following:

1) Loaded the rtiff library

2) Downloaded and installed the TIFF library 3.8.2 (complete package and
sources) from the following website:
http://gnuwin32.sourceforge.net/packages/tiff.htm

I would like to ask the R experts for help with the following things:

1) Where do I get the tiffio.h file?

2) Where do I install or relocate the tiffio.h and TIFF library to so
that rtiff will work?

Thanks for your help.

Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Partial proportional odds logistic regression

2007-01-08 Thread Inman, Brant A. M.D.

Just a follow-up note on my last posting.  I still have not had any
replies from the R-experts our there that use partial proportional odds
regression (and I have to hope that there are some of you!) but I do
think that I have figured out how to perform the unconstrained partial
proportional odds model using vglm.  I show this code below for the
benefit of others that may want to try it (or point out my errors) using
one of the datasets in Petersen and Harrell's paper (Appl Stat 1990).
However, I remain open for suggestions on how to implement the
unconstrained partial proportional odds model.

--

library(VGAM)
library(MASS)
library(Design)

###
# Nausea dataset
# Peterson and Harrell. Applied Statistics 1990, 39(2): 205-217

nausea.short <- data.frame(matrix(nrow=12, ncol=3)) #Table 2
colnames(nausea.short) <- c('nausea', 'cisplatin', 'freq')
nausea.short[,1] <- ordered(rep(seq(0,5,1),2),
labels=seq(0,5,1))
nausea.short[,2] <- c(rep(0,6), rep(1,6))
nausea.short[,3] <- c(43,39,13,22,15,29,7,7,3,12,15,14)

# Proportional odds ordinal logistic regression: 3 options
polr(nausea ~ cisplatin, weights=freq, data=nausea.short,
method='logistic')
lrm(nausea ~ cisplatin, weights=freq, data=nausea.short)
vglm(nausea ~ cisplatin, weights=freq, data=nausea.short,
family=cumulative(parallel=T, reverse=T))


# Unconstrained partial proportional odds ordinal logistic regression
vglm(nausea ~ cisplatin, weights=freq, data=nausea.short,
family=cumulative(parallel=F, reverse=T))

--

The results obtained with this approach appear consistent with those
presented in Table 3 of the paper.  However, the code for the
unconstrained partial proportional odds model is so simple (just one
letter is different than in the proportional odds model!) that I wonder
if there is not room for error here that I am too inexperienced to
identify.

Again, help with the constrained model would be greatly appreciated.

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Partial proportional odds logistic regression

2007-01-07 Thread Inman, Brant A. M.D.

R-experts:

I would like to explore the partial proportional odds models of Peterson
and Harrell (Applied Statistics 1990, 39(2): 205-217) for a dataset that
I am analyzing.  I have not been able to locate a R package that
implements these models.  Is anyone aware of existing R functions,
packages, etc... that might be used to implement the partial
proportional odds models?

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using VGAM's vglm function for ordinal logistic regression

2007-01-07 Thread Inman, Brant A. M.D.
Thank you for the help.  Indeed, the differences in the results that I
noted were due to the incorrect ordering of the response variable that
resulted from my unattentive use of as.ordered on a character vector.

Brant


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 2:52 AM
To: Inman, Brant A. M.D.
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] Using VGAM's vglm function for ordinal logistic
regression

On Sat, 6 Jan 2007, Inman, Brant A.   M.D. wrote:

>
> R-Experts:
>
> I am using the vglm function of the VGAM library to perform
proportional
> odds ordinal logistic regression.  The issue that I would like help
with
> concerns the format in which the response variable must be provided
for
> this function to work correctly.

> pneumo2$severity
  [1] normal normal normal normal normal normal normal normal mild
mild
[11] mild   mild   mild   mild   mild   mild   severe severe severe
severe
[21] severe severe severe severe
Levels: mild < normal < severe

is different from the ordering in the first example.

The difference between vglm and polr is that the latter uses the 
conventional sign for the coefficients: see MASS4 p.204.

I would never use as.ordered on a character vector, as this leaves it to
R 
to choose the ordering.  (Even if you think you intended alphabetical 
order, that depends on the locale: see the warnings on the help page.)


> Consider the following example:
>
> --
>
> library(VGAM)
> library(MASS)
>
> attach(pneumo)
> pneumo# Inspect the format of the original dataset
>
> # Restructure the pneumo dataset into a different format
> pneumo2 <- data.frame(matrix(ncol=3, nrow=24))
> colnames(pneumo2) <- c('exposure.time', 'severity', 'freq')
> pneumo2[,1] <- rep(pneumo[,1],3)
> pneumo2[,2] <-
> as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
> pneumo2[1:8,3]   <- pneumo[,2]
> pneumo2[9:16,3]  <- pneumo[,3]
> pneumo2[17:24,3] <- pneumo[,4]
> pneumo2   # Inspect the format of the new modified dataset
>
> --
>
> The problem occurs when I try to analyze these two datasets, which are
> identical in content, with the proportional odds model using vglm:
>
> --
>
> # Analyze the original dataset with vglm, get one set of results
>
>> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
> data=pneumo,
> +  family=cumulative(parallel=T))
>
> Coefficients:
> (Intercept):1  (Intercept):2 log(exposure.time)
>  9.676093  10.581725  -2.596807
>
> Degrees of Freedom: 16 Total; 13 Residual
> Residual Deviance: 5.026826
> Log-likelihood: -204.2742
>
> # Analyzing the modified dataset with vglm gives another set of
results
>
>> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,
> + family=cumulative(parallel=T))
>
> Coefficients:
> (Intercept):1  (Intercept):2 log(exposure.time)
>-1.6306499  2.5630170 -0.1937956
>
> Degrees of Freedom: 48 Total; 45 Residual
> Residual Deviance: 503.7251
> Log-likelihood: -251.8626
> Warning messages:
> 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
> 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
> trace, wzeps = control$wzepsilon)
>
> # Analyzing the modified dataset with polr reproduces these second
> results
>
>> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)
>
> Coefficients:
> log(exposure.time)
> 0.1938154
>
> Intercepts:
>  mild|normal normal|severe
>-1.630612  2.563055
>
> Residual Deviance: 503.7251
> AIC: 509.7251
>
> --
>
> Can someone explain what I am doing wrong when using vglm and polr
with
> the modified dataset?  I do not understand why these two formulations
> should give different results.
>
> Brant Inman
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,

[R] Using VGAM's vglm function for ordinal logistic regression

2007-01-06 Thread Inman, Brant A. M.D.

R-Experts:

I am using the vglm function of the VGAM library to perform proportional
odds ordinal logistic regression.  The issue that I would like help with
concerns the format in which the response variable must be provided for
this function to work correctly. Consider the following example:

--

library(VGAM)
library(MASS)

attach(pneumo)
pneumo  # Inspect the format of the original dataset

# Restructure the pneumo dataset into a different format
pneumo2 <- data.frame(matrix(ncol=3, nrow=24))
colnames(pneumo2) <- c('exposure.time', 'severity', 'freq')
pneumo2[,1] <- rep(pneumo[,1],3)
pneumo2[,2] <-
as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
pneumo2[1:8,3]   <- pneumo[,2]
pneumo2[9:16,3]  <- pneumo[,3]
pneumo2[17:24,3] <- pneumo[,4]
pneumo2 # Inspect the format of the new modified dataset

--

The problem occurs when I try to analyze these two datasets, which are
identical in content, with the proportional odds model using vglm:

--

# Analyze the original dataset with vglm, get one set of results 

> vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
data=pneumo, 
+  family=cumulative(parallel=T))

Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time) 
  9.676093  10.581725  -2.596807 

Degrees of Freedom: 16 Total; 13 Residual
Residual Deviance: 5.026826 
Log-likelihood: -204.2742

# Analyzing the modified dataset with vglm gives another set of results

> vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2, 
+ family=cumulative(parallel=T))

Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time) 
-1.6306499  2.5630170 -0.1937956 

Degrees of Freedom: 48 Total; 45 Residual
Residual Deviance: 503.7251 
Log-likelihood: -251.8626 
Warning messages:
1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 
5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
trace, wzeps = control$wzepsilon) 

# Analyzing the modified dataset with polr reproduces these second
results

> polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)

Coefficients:
log(exposure.time) 
 0.1938154 

Intercepts:
  mild|normal normal|severe 
-1.630612  2.563055 

Residual Deviance: 503.7251 
AIC: 509.7251

--

Can someone explain what I am doing wrong when using vglm and polr with
the modified dataset?  I do not understand why these two formulations
should give different results. 

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calling "confint.glm" from within another function

2007-01-05 Thread Inman, Brant A. M.D.

On July 12, 2004 Spencer Graves wrote an email describing essentially
the same issue that I would like help on: calling the confint function
from within another homemade function.  Because he provided many good
examples of the problem, I will not reproduce them here but will instead
refer readers to the original posting:

http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg23826.html

It is not clear from the replies to his posting how he managed to solve
the problem, so I am reposting a similar note again in hopes of
guidance.  In particular, I would like to know if and how I need to
modify the confint function call so that I can access it from within
another function that I have written.

Thanks,

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Profile confidence intervals and LR chi-square test

2006-11-14 Thread Inman, Brant A. M.D.

Thank you to Prof Ripley and Henric Nilsson for their observation that I
was using "anova" inappropriately for the question that I was trying to
answer. Note that the Wald statistics and confidence interval were
calculable in the previous email but that I prefered using the more
accurate deviance statistics.  I will demonstrate my error for the
benefit of those new users of R (like me) that may not have appreciated
how the "anova" function is working SEQUENTIALLY, and what SEQUENTIALLY
actually means in this context.

Since the "anova" function is a sequential test of the current model,
only the statistic for the last term in the model formula is a true
deviance chi-square statistic for (full model) .vs. (full model - term).
For instance, using the data upon which this question was based,
consider the following 2 models:

--

> fit0 <- glm(y ~ 1, data=d, family=binomial(link=logit),
na.action=na.omit)
> fit1 <- update(fit0, . ~ . + x1 + x2 + x3)

--

Here, fit0 is the null (intercept-only) model and fit1 is the full model
(without interactions because interactions are not biologically
plausible in this medical dataset). Now notice the result of the "anova"
function for the full model:

--

> anova(fit1, test='Chisq')

...

 Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL99137.989  
x118.26798129.721 0.004
x215.63997124.083 0.018
x313.43396120.650 0.064

---

It is incorrect to interpret the deviance chi-square test presented
above for x1 (P=0.004) as the deviance chi-square statistic comparing
(y~x1+x2+x3) .vs. (y~x2+x3) as the statistic calculated is for (y~1)
.vs. (y~x1). Similarly, the deviance chi-square statistic calculated for
x2 (P=0.018) is NOT for (y~x1+x2+x3) .vs. (y~x1+x3) but for (y~x1) .vs.
(y~x1+x2).  Lastly, the deviance chi-square statistic for x3(P=0.064) is
probably the most intuitive because it is for the comparison of
(y~x1+x2+x3) .vs. (y~x1+x2), the result we would typically want to
present for x3 in the full model.  So how do you get the correct
deviance chi-square statistics for x1 and x2 in the full model?  There
are a couple of ways of arriving at the same answer as I will
demonstrate for the case of x1.

Option#1: Reorder the full model so that x1 is the last term in the
model formula

---

> fit2 <- glm(y ~ x2 + x3 + x1, data=d, family=binomial(link=logit),
na.action=na.omit)
> anova(fit2, test='Chisq')
...

 Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL99137.989  
x217.30598130.683 0.007
x313.82197126.862 0.051
x116.21296120.650 0.013

---

Notice that the deviance chi-square statistics for all of the variables
have changed, despite fit2 being identical in content to fit1. Just the
order of the terms in the model formula have changed from fit1 to fit2.
The deviance statistic for x1 is now the correct one for the full model,
that is for the comparison (y~x1+x2+x3) .vs. (y~x2+x3).

Option#2: Compare the full model to a reduced model that does not
include x1.

---

> fit3 <- update(fit1, . ~ . - x1)
> anova(fit1, fit3, test='Chisq')
...

Model 1: y ~ x1 + x2 + x3
Model 2: y ~ x2 + x3
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
196120.650  
297126.862 -1   -6.212 0.013

---

Fit3 is the same model as fit1 except that it is missing the x1 term.
Therefore, any change in the deviance chi-square statistic is due to the
deletion of x1 from full model. Notice that the difference in residual
deviances between fit3 and fit1 (126.862 - 120.650 = 6.212) is the same
the difference b/t x1 and x3 in option#1.


Brant

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Profile confidence intervals and LR chi-square test

2006-11-13 Thread Inman, Brant A. M.D.

System: R 2.3.1 on Windows XP machine.

I am building a logistic regression model for a sample of 100 cases in
dataframe "d", in which there are 3 binary covariates: x1, x2 and x3.



> summary(d)
 y  x1 x2 x3
 0:54   0:50   0:64   0:78  
 1:46   1:50   1:36   1:22  

> fit <- glm(y ~ x1 + x2 + x3, data=d, family=binomial(link=logit))

> summary(fit)

Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = logit), 
data = d)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-1.6503  -1.0220  -0.7284   0.9965   1.7069  

Coefficients:
Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -0.3772 0.3721  -1.014   0.3107  
x11  -0.8144 0.4422  -1.842   0.0655 .
x21   0.9226 0.4609   2.002   0.0453 *
x31   1.3347 0.5576   2.394   0.0167 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 137.99  on 99  degrees of freedom
Residual deviance: 120.65  on 96  degrees of freedom
AIC: 128.65

Number of Fisher Scoring iterations: 4

> exp(fit$coef)
(Intercept) x11 x21 x31 
  0.6858006   0.4429233   2.5157321   3.7989873 
---

After reading the appropriate sections in MASS4 (7.2 and 8.4 in
particular), I decided to estimate the 95% confidence intervals for the
odds ratios using the profile method implemented in the "confint"
function. I then used the "anova" function to perform the deviance
chi-square tests for each covariate.

---
> ci <- confint(fit); exp(ci)
Waiting for profiling to be done...
2.5 %97.5 %
(Intercept) 0.3246680  1.413684
x11 0.1834819  1.048154
x21 1.0256096  6.314473
x31 1.3221533 12.129210

> anova(fit, test='Chisq')
Analysis of Deviance Table

Model: binomial, link: logit

Response: y

Terms added sequentially (first to last)


 Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL99137.989  
x115.85698132.133 0.016
x215.27197126.862 0.022
x316.21296120.650 0.013


My question relates to the interpretation of the significance of
variable x1.  The OR for x1 is 0.443 and its profile confidence interval
is 0.183-1.048.  If a type I error rate of 5% is assumed, this result
would tend to suggest that x1 is NOT a significant predictor of y.
However, the deviance chi-square test has a P-value of 0.016, which
suggests that x1 is indeed a significant predictor of y. How do I
reconcile these two differing messages? I do recognize that the upper
bound of the confidence interval is pretty close to 1, but I am certain
that some journal reviewer will point out the problem as inconsistent.

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model using mfp library

2006-11-08 Thread Inman, Brant A. M.D.

I am running R 2.3.1 on a Windows XP machine.  I have a large dataset of
over 13 000 cases of a disease for which I am attempting to build a
prognostic model using Cox proportional hazards regression.  Some of the
continuous covariates are skewed and therefore require transformation
for use in the Cox model.  I have empirically selected some simple
normalizing transformations (based on previous knowledge and the ladder
of powers) and wanted to compare these selections to those produced by
the fractional polynomial technique of Royston and Altman.

When I attempt to fit a Cox model with the mfp library for overall
survival using a single left-skewed covariate called "age", I get the
following output:


> library(mfp)
> fit<- mfp(Surv(time.dead,dead.any)~ fp(age), family=cox, verbose=T,
+ na.action=na.exclude)

VariableDeviancePower(s)

Cycle 1
age  
27071.72 
26704.421
26701.832
26695.2 -2 -2


Tansformation
shift scale
age 0   100

Fractional polynomials
df.initial select alpha df.final power1 power2
age  4  1  0.054 -2 -2


Null model: 27071.72
Linear model: 26704.42
Final model: 26695.2
Error in "[.data.frame"(list("Surv(time.dead, dead.any)" = c(NA, NA, NA,
: 
invalid subscript type

--

Can anyone tell me what this error message means and why I am getting
it?  I did not get this error when I tried similar code using the GBSG
dataset provided with the mfp library.

Brant Inman
Mayo Clinic

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import problem with S-Plus 7.0 dataset

2006-11-03 Thread Inman, Brant A. M.D.

Thanks to Brian Ripley and Richard Heiberger for the solution to my
problem.  I will spell it out below in steps so that (hopefully) nobody
else will need to ask the same question in the future (if they search
the R help archive).

HOW TO GET S-PLUS 7.0 DATA into R
__

STEP 0  (in S-Plus 7.0): Generation of a fake dataframe:

> x <- c(1:10)
> y <- rep(c(0,1),5)
> mydata <- data.frame(cbind(x,y))
> mydata
x y
1   1 0
2   2 1
3   3 0
4   4 1
5   5 0
6   6 1
7   7 0
8   8 1
9   9 0
10 10 1

STEP 1 (in S-Plus 7.0): Save the dataframe as earlier versions of S-Plus
did
(note: I called the output file "ddump.sdd" instead of "mydata.sdd" to
demonstrate a point later on)

>data.dump('mydata', file='C:\\temp\\ddump.sdd', oldStyle=T)

STEP 2 (In R): Restore the saved dataframe

>data.restore('C:\\temp\\ddump.sdd')
[1] "C:\\temp\\ddump.sdd"

STEP 3 (In R): View/use the first few cases of dataset)

> head(mydata)
  x y
1 1 0
2 2 1
3 3 0
4 4 1
5 5 0
6 6 1


There are 2 important points to note:
1) The dataframe keeps the name it had in S-Plus, not the name of the
file you saved it in. Using the example above, R never creates an object
called "ddump":

> data.restore('C:\\temp\\ddump.sdd')
[1] "C:\\temp\\ddump.sdd"
> ddump
Error: object "ddump" not found

2) Saving the data.restore result into a new variable is not that useful
because it will NOT contain your dataframe, only the name of the file
that contained your dataframe:

>test <- data.restore('C:\\temp\\ddump.sdd')
>test
[1] "C:\\temp\\ddump.sdd"


Brant Inman



-Original Message-
From: Richard M. Heiberger [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 03, 2006 3:07 PM
To: Inman, Brant A. M.D.; Brian Ripley
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Import problem with S-Plus 7.0 dataset

It looks like it works.  The result you printed is
the name of the file that data.store read.  The name of the
variable is the same as the name you called it in S-Plus.

type
   data[1:5,]
and you should see your data.frame.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Import problem with S-Plus 7.0 dataset

2006-11-03 Thread Inman, Brant A. M.D.

As recommended, I have tried the following code to solve my problem
importing data to R from S-Plus 7.0. Unfortunately, I have not had
success.

In S-Plus:

> data.dump('data', file='C:\\temp\\ddump.sdd', oldStyle=T)

This resulted in the production of a file called "ddump.sdd" that I can
import into S-Plus without any problem.  However, when I use the
following code in R ...

In R:

> d <- data.restore(file='C:\\temp\\ddump.sdd')
> d
[1] "C:\\temp\\ddump.sdd"

> d <- data.restore('C:\\temp\\ddump.sdd')
> d
[1] "C:\\temp\\ddump.sdd"
> 

I also tried the following:

> d <- read.S(file='C:\\temp\\ddump.sdd')
Error in read.S(file = "C:\\temp\\ddump.sdd") : 
not an S object

I thought that S-Plus might be doing something funny when I save as a
.sdd file, so I tried saving as "ddump.dat" but got exactly the same
results.

I have been able to save my original datafile as a csv and import it,
but I had hoped that it would be possible to keep my variable formats by
importing the S dataframe into R directly.

Brant Inman

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Import problem with S-Plus 7.0 dataset

2006-11-03 Thread Inman, Brant A. M.D.

I am running R 2.3.1 on a Windows XP machine.  I am trying to import
some data into R that is stored as an S-Plus 7.0 .sdd file.

When I run the following command, I get this error:

> library(foreign)
> d <- read.S(file='H:\\Research\\data.sdd')

Error in read.S(file = "H:\\Research\\data.sdd") : 
not an S object


The dataset is fairly large, roughly 13000 rows and 100 columns, but
within S-Plus 7.0 it is stored as a normal dataframe (not a bd
dataframe).

Any ideas?

Brant Inman



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confidence intervals for Brier score

2006-11-01 Thread Inman, Brant A. M.D.

I am using the ipred library to calculate the censored Brier score for a
Cox proportional hazards model.  I would like to know if anyone has
developed a method of calculating confidence intervals for the various
forms of the Brier score that are used in the analysis of
survival/censored data.  If so, are these implemented in S?

Thank you

Brant Inman

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Missing data analysis in R

2006-10-31 Thread Inman, Brant A. M.D.

I am looking for a book that discusses the theory of multiple imputation
(and other methods of dealing with missing data) and, just as
importantly, how to implement these methods in R or S-Plus.  Ideally,
the book would have a structure similar to Faraway (Regression),
Pinheiro&Bates (Mixed Effects) and Wood (GAMs) and would be very modern
(i.e. published within the last couple of years).  

Any ideas?  If such a book does not exist, one of the experts on this
help list should write it! (I will gladly buy the first copy.)

Brant Inman
Mayo Clinic

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating the probability of an event a timeoint "t" from a Cox model fit

2006-10-28 Thread Inman, Brant A. M.D.
 

I would like to determine the probability of an event at a specific
timepoint given the linear predictor of a given Cox model.  For
instance, assume that I fit the following model:

 

data(pbc)

fit <- coxph(Surv(time, status)~ age, data=pbc)

 

To extract the value of the linear predictor for each patient in the
dataset:

 

prd <- predict(fit, newdata=pbc, type="lp")

 

However, what I am really interested in is the predicted probability
from the Cox model that an individual will experience an event by some
time t (say 2000 days in this example).  To my knowledge, calculating
this probability requires knowing one of the 3 baseline functions-the
survivor function or the hazard function or the cumulative hazard
function-estimated from the Cox model.  I would like to know :

 

1) How do I extract the correct baseline hazard so that I can
predict the individual probablility of failure at 2000 days, given a
particular patient covariate pattern (in this case, a particular
patient's age?  Is the baseline hazard for the average patient
covariates the ideal one?

2) Is there a better way of directly extracting these probabilities?

 

 

Brant Inman

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Censored Brier Score and Royston/Sauerbrei's D

2006-10-27 Thread Inman, Brant A. M.D.
System: R 2.3.1 on a Windows XP computer. 

I am validating several cancer prognostic models that have been
published with a large independent dataset.  Some of the models report a
probability of survival at a specified timepoint, usually at 5 and 10
years.  Others report only the linear predictor of the Cox model.

I have used Harrell's c index for censored data (rcorr.cens) as a
measure of discrimination and have constructed smoothed calibration
plots.  I would like to include some measures of overall model
performance, such as the censored Brier score and Royston/Sauerbrei's D
statistic (Stat Med 2004). With this in mind, I have 3 questions:

1) Can the "sbrier" function of the "ipred" library be used to calculate
the censored Brier score for a specific time point given known
predictions for that timepoint?  

library(ipred)

data <- read.csv(file='c:\\

time<- data$time# The time in years from diagnosis of cancer to
death
status <- data$status   # The status at last follow-up: 1=dead, 0=alive
pred<- data$pred# The predicted probability of surviving 5 years
after cancer from external Cox model A
linp <- data$linp   # The linear predictor of external Cox model B
predicting survival after cancer 

s <- Surv(time, status)
test <- sbrier(s, pred, btime=5)

# I get this error message that I can't seem to solve
Error in Surv(time, 1 - cens) : Time variable is not numeric
In addition: Warning message:
is.na() applied to non-(list or vector) in: is.na(time)

2) Can the linear predictor (linp) be used in sbrier in the same way as
a probability might?

3) Has anyone implemented Royston/Sauerbrei's D?

Brant Inman
Mayo Clinic


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.