[R] Package rms: c-statistic from lrm function with weights
Dear list, I am using the lrm function from the rms package to estimate a logistic model with weights. The c-statistic (or area under the curve) is part of the lrm output. To understand how the weights enter the computation of the c-statistics, I looked at the script of lrm and lrm.fit but I am out of luck because it is making a call to a Fortran routine and I don't know Fortran. z <- .Fortran("lrmfit", coef = initial, nx, 1:nx, x, y, offset, u = double(nvi), double(nvi * (nvi + 1)), double(1), n, nx, sumw, nvi, v = double(nvi * nvi), double(nvi), double(2 * nvi), double(nvi), integer(nvi), opts = opts, ftable, penmat, weights, PACKAGE = "rms") Can somebody help me figure out how the weights from the regression are used in the computation of the c-statistic? Here is a small example that shows that the c-statistic computed from the rms package and using the pROC packages are not the same (not even close) when calculated from a weighted logistic regression. set.seed(1233) x <- rnorm(100) w <- runif(100) y <- rbinom(100, 1, .5) require(rms) # unweighted model umod <- lrm(y~x) umod$stat # c-statistic is 0.5776796 # weighted model wmod <- lrm(y~x, weight = w) wmod$stat # c-statistic is 0.65625 # using pROC require(pROC) umod2 <- glm(y~x, family = binomial) auc(y, predict(umod2)) # 0.5769 wmod2 <- glm(y~x, weights = w, family = binomial) auc(y, predict(wmod2)) # 0.5769 BTW results from umod and umod2 and from wmod and wmod2 are identical so the discrepancy in c-statistics in not due to using lrm vs. glm. Best regards, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Frequencies for a list of vectors
Dear R users, I have a list of vectors (list is called HTNlist). Each vector is of length 1 to 4 and takes only 0 and 1 as values. E.g. head(HTNlist) $`30008` [1] 1 0 1 0 $`60008` [1] 0 0 1 0 $`90008` [1] 0 0 1 0 $`17` [1] 1 $`130001` [1] 0 1 $`130007` [1] 1 0 1 0 I would like to obtain a frequency table for the elements of the list. I want to know how many of '1 0 0' I have in the list, how many '1 0 1 0' etc. Can you please help? Thank you in advance, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.formula() to add interaction terms
Many thanks, that is exactly what I was looking for. I did not know about bquote, interesting! MP On Mon, Feb 3, 2014 at 6:29 PM, William Dunlap wdun...@tibco.com wrote: Is this the sort of thing you are looking for? fm - y ~ x1 + x2 + x3 + log(x4) # Use terms() instead of just all.vars() to keep log(x4) as log(x4) xVars - with(attributes(terms(fm)), as.list(variables)[-c(1,response+1)]) str(xVars) List of 4 $ : symbol x1 $ : symbol x2 $ : symbol x3 $ : language log(x4) # use bquote to make the addition to the formula update(fm, bquote( ~ . + .(xVars[[1]]) * .(xVars[[length(xVars)]]))) y ~ x1 + x2 + x3 + log(x4) + x1:log(x4) As a function it would be addInteraction - function(formula){ xVars - with(attributes(terms(formula)), as.list(variables)[-c(1,response+1)]) update(formula, bquote( ~ . + .(xVars[[1]]) * .(xVars[[length(xVars)]]))) } used as addInteraction(y~x1+x2+sqrt(x3)) y ~ x1 + x2 + sqrt(x3) + x1:sqrt(x3) If the last 'term' in the formula is a compound like x4:x5 and you want x1:x4:x5 added you will need to do more work (look at the 'factors' attribute of terms()'s output) - currently it adds x1:x5. Bill Dunlap TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Marie-Pierre Sylvestre Sent: Monday, February 03, 2014 12:54 PM To: r-help@r-project.org Subject: [R] update.formula() to add interaction terms Hi, I have a list of formulae that I need to modify. For each formula, I need to add an interaction term between the first independent variable and the last one. I want to write a function to achieve that because the list of formulae to modify is long. For example, if the formula is y ~ x1 + x2 + x3 + x4, then I need to turn it either into y ~ x1*x4 + x2 + x3 or y ~ x1 + x2 + x3 + x4 + x4:x1 (I know they are the same, but one may be easier to work with than the other). Suppose I have the formula a which is defined as a - formula(y ~ x1+x2+x3+x4) I know how to access to the first and last independent variables of my formula: firstvar - all.vars(a[[3]])[1] lastvar - all.vars(a[[3]])[length( all.vars(a[[3]]))] What I can't figure out is how to use update.formula() to include my interaction term in order to get y ~ x1+x2+x3+x4 + x1:x4. Specifically, update.formula(a, ~ . + paste(all.vars(a[[3]])[1],firstvar, lastvar, sep = ':')) is not producing y ~ x1+x2+x3+x4 + x1:x4. Any idea? Thanks in advance, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update.formula() to add interaction terms
Hi, I have a list of formulae that I need to modify. For each formula, I need to add an interaction term between the first independent variable and the last one. I want to write a function to achieve that because the list of formulae to modify is long. For example, if the formula is y ~ x1 + x2 + x3 + x4, then I need to turn it either into y ~ x1*x4 + x2 + x3 or y ~ x1 + x2 + x3 + x4 + x4:x1 (I know they are the same, but one may be easier to work with than the other). Suppose I have the formula a which is defined as a - formula(y ~ x1+x2+x3+x4) I know how to access to the first and last independent variables of my formula: firstvar - all.vars(a[[3]])[1] lastvar - all.vars(a[[3]])[length( all.vars(a[[3]]))] What I can't figure out is how to use update.formula() to include my interaction term in order to get y ~ x1+x2+x3+x4 + x1:x4. Specifically, update.formula(a, ~ . + paste(all.vars(a[[3]])[1],firstvar, lastvar, sep = ':')) is not producing y ~ x1+x2+x3+x4 + x1:x4. Any idea? Thanks in advance, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] meta-analysis, outcome = OR associated with a continuous independent variable
Hello everyone, I want to do a meta-analysis of case-control studies on which an OR was computed based on a continuous exposure. I have found several several packages (metafor, rmeta, meta) but unless I misunderstood their main functions, it seems to me that they focus on two-group comparisons (binary independent variable), and do not have the option of using a continuous independent variable. If this is right, do you have any suggestions for a meta-analysis with continuous independent variable? I using lme or lme4 with weights my only option? Thanks in advance, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pairs and panel.smooth for two groups
Hi, I have modified the USJudgeRatings data (available in R) to illustrate my question. # Use the first 4 variables of USJudgeRatings and add a group variable with two levels USJudgeRatings - USJudgeRatings[,1:4] USJudgeRatings$group - factor(c(rep(1, 22), rep(0, 21))) # I can draw a pairs graph where members of each group are drawn in different colors: pairs(USJudgeRatings[,1:4], col = c(2,3)[USJudgeRatings$group], pch = c(21,3)[USJudgeRatings$group]) # I would also like to add a smooth line to each subplot like pairs(USJudgeRatings[,1:4], panel=panel.smooth) # but I want the smooth to be done for each of the group, i.e. I want two smooths per subplot. # this creates only one smooth pairs(USJudgeRatings[,1:4], col = c(2,3)[USJudgeRatings$group], pch = c(21,3)[USJudgeRatings$group], panel = panel.smooth) # I understand that panel.smooth is a function that is called for each subplot. I don't know how to tell it to do a smooth for each of my group in each of the subplot. Any help would be appreciated! Best, Marie-Pierre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Converting a data set from 'long' format to 'interval' format
Hi, I have a data set in which the variable 'dose' is time-varying. Currently, the data set is in a long format, with 1 row for each time unit of follow-up for each individual Id. It looks like this: orig.data - cbind(Id = c(rep(1,4), rep(2,5)), time = c(1:4, 1:5), dose = c(1,1,1,0,1,0,1,1,0)) orig.data Id time dose [1,] 111 [2,] 121 [3,] 131 [4,] 140 [5,] 211 [6,] 220 [7,] 231 [8,] 241 [9,] 250 What I would like to do is to convert the data set into an interval format. By that I mean a data set in which each row has a 'Start' and a 'Stop' value that indicates the time units in which the 'dose' is constant. For example, my orig.data example would now be: int.data - cbind(Id = c(rep(1,2), rep(2,4)), Start = c(1,4,1,2,3,5), Stop = c(3,4,1,2,4,5), dose = c(1,0,1,0,1,0)) int.data Id Start Stop dose [1,] 1 131 [2,] 1 440 [3,] 2 111 [4,] 2 220 [5,] 2 341 [6,] 2 550 Basically, this implies collapsing rows that have the same Id and dose and creating Start and Stop to index the time. While I can write a clumsy routine with multiple loops to do it, it will be inefficient and will not work for large data set. I wonder if people know of a function that would reshape my data set from 'long' to 'interval'? Best, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gamm (mgvc) and time-varying coefficient model
Dear R users, I have repeated measurements on individuals. I want to estimate the time-varying effect of a factor variable X (taking three levels), e.g. a model in the spirit of Hastie and Tibshirani (1993). I am considering using the package mgvc which implements generalized additive models, especially the function gamm, which estimates generalized additive mixed models, and thus, can deal with the correlated repeated measures within individuals. However, I am confused as to how to specify the time-varying coefficient part of the formula. According to the mgvc documentation (p. 35): by variables are the means for constructing 'varying-coefficient models' (geographic regression models) and for letting smooths 'interact' with factors or parametric terms. Suppose that y is the response variable, id identifies individuals, x is the three-level factor variable and time indexes the chronology of responses. Is this model estimating the time-varying coefficient of x? If it is not, how should I specify the model? mod - gamm( y ~ factor(x) + s(time, by=factor(x)), data=mydata, random=list(patient=~ 1), correlation = corAR1()) Best, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gamm (mgvc) and time-varying coefficient model
Dear R users, I have repeated measurements on individuals. I want to estimate the time-varying effect of a factor variable X (taking three levels), e.g. a model in the spirit of Hastie and Tibshirani (1993). I am considering using the package mgvc which implements generalized additive models, especially the function gamm, which estimates generalized additive mixed models, and thus, can deal with the correlated repeated measures within individuals. However, I am confused as to how to specify the time-varying coefficient part of the formula. According to the mgvc documentation (p. 35): by variables are the means for constructing 'varying-coefficient models' (geographic regression models) and for letting smooths 'interact' with factors or parametric terms. Suppose that y is the response variable, id identifies individuals, x is the three-level factor variable and time indexes the chronology of responses. Is this model estimating the time-varying coefficient of x? If it is not, how should I specify the model? mod - gamm( y ~ factor(x) + s(time, by=factor(x)), data=mydata, random=list(patient=~ 1), correlation = corAR1()) Best, MP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating a package with internal (hiden) S4 classes and methods
Dear R users, I am writing my first R package and it finally past the R CMD check. Before sending it to CRAN, I would like to check a few issues. 1. I created s4 classes and methods in the package but I want only 1 function (foo) to be available to users and documented in the manual that goes with the package. I put foo in 1 file and the rest (internal functions, classes and methods) in a file called mypackage-internal.R. For the .Rd file, I created one for each class, method and internal functions, and I wrote \keyword{internal} and the end. Finally, I have created a NAMESPACE that export my function f00. I thought it would work but it created a problem with the classes and methods not being declared so I added Classes(''myclass) and Methods(my method) to the NAMESPACE file. Does this make sense as to a way to hide classes and methods from users, and avoid describing them? 2. Everything goes smoothly when I R CMD build and R CMD check. However, when I look at the mypackage-manual.tex file that the R CMD check created, I see all my internal classes, methods and functions listed. This is not what I want, I only want foo. I know that I need to send the tar.gz file to CRAN but I am clueless as to how the package manual (pdf) will be created. Will it look like the mypackage-manual.tex? If so, how do I make sure that only the package DESCRIPTION and the foo function are listed? Please do not simply refer me to the manual 'Writing R extensions!'. I have read it carefully but I have to admit that some of the stuff explained there is beyond me. I am lost. many thanks in advance, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] building a package that contains S4 classes and methods
Hello R users, I am trying to make a my first package and I get an error that I can understand. The package is build out of three files (one for functions, 1 for s4 classes and 1 for s4 methods). Once I source them I run package.skeleton( name=TDC ) within a R session and I get Creating directories ... Creating DESCRIPTION ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in './TDC/Read-and-delete-me'. Warning messages: 1: In dump(internalObjs, file = file.path(code_dir, sprintf(%s-internal.R, : deparse of an S4 object will not be source()able 2: In dump(internalObjs, file = file.path(code_dir, sprintf(%s-internal.R, : deparse of an S4 object will not be source()able 3: In dump(internalObjs, file = file.path(code_dir, sprintf(%s-internal.R, : deparse of an S4 object will not be source()able 4: In dump(internalObjs, file = file.path(code_dir, sprintf(%s-internal.R, : deparse may be incomplete I keep going in spite of the warnings with R CMD check --no-examples TDC and I get * checking for working pdflatex ... OK * using log directory '/home/mariepierre/Packages/PermAlgo/PermAlgo/PermAlgo2/TDC.Rcheck' * using R version 2.7.1 (2008-06-23) * using session charset: UTF-8 * checking for file 'TDC/DESCRIPTION' ... OK * checking extension type ... Package * this is package 'TDC' version '1.0' * checking package dependencies ... OK * checking if this is a source package ... OK * checking whether package 'TDC' can be installed ... ERROR Installation failed. The error file says: * Installing *source* package 'TDC' ... ** R ** preparing package for lazy loading Error in parse(n = -1, file = file) : unexpected '' at 102: `.__C__BindArgs` - 103: Calls: Anonymous - code2LazyLoadDB - sys.source - parse Execution halted ERROR: lazy loading failed for package 'TDC' ** Removing '/home/mariepierre/Packages/PermAlgo/PermAlgo/PermAlgo2/TDC.Rcheck/TDC' The problem is with my classes and methods. The respective files contain: setClass(BindArgs, signature( function )) setClass(BindArgs2, signature( function )) and setMethod(initialize, BindArgs, function( .Object, f, ... ) callNextMethod( .Object, function( x ) f( x, ... ) )) setMethod(initialize, BindArgs2, function( .Object, f, ...) callNextMethod( .Object, function( x, y ) f( x, y, ... ) )) Everything works well within a R session but I can build the package. If I look at the internal R file that this created I get `.__C__BindArgs` - S4 object of class structure(classRepresentation, package = methods) `.__C__BindArgs2` - S4 object of class structure(classRepresentation, package = methods) `.__M__initialize:methods` - S4 object of class structure(MethodsList, package = methods) `.__T__initialize:methods` - environment Well, let just say that I am new to classes so this confuses me greatly. I have checked the documentation and tried a few things but I reached my personal limits! I am using R 2.7.1 on Linux Fedora 8. Any comments on what is happening and/or help would be greatly appreciated. MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tidying up code - Warning message: deparse may be incomplete
Dear R users, I am currently writing a R package and to do so I am following the guidelines in manual 'Writing R extensions'. In Section 3.1, it is suggested to tidy up the code using a file containing the following: options(keep.source = FALSE) source(myfuns..R) dump(ls(all = TRUE), file = new.myfuns.R) I have done this for my own packages and although it runs, I get several warnings of the type: Warning message: In dump(ls(all = TRUE), file = PermAlgo.R) : deparse may be incomplete I am clueless as to what this means. Even if I try to tidy only one function from my code, I get the warning. E.g. the file lala.R contains only this: partialHazards - function(t, v, covArray, betas){ exp( covArray[v,t,] %*% betas ) } the file tidylala.R contains: options(keep.source = FALSE) source(lala.R) dump(ls(all = TRUE), file = newlala.R) On Linux I run: R --vanilla tidylala.R Then I obtain: Warning message: In dump(ls(all = TRUE), file = newlala.R) : deparse may be incomplete The file newlala.R looks like this: `partialHazards` - function (t, v, covArray, betas) { exp(covArray[v, t, ] %*% betas) } What does the warning mean? Can I simply ignore it? thanks, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fixing the coefficient of a regressor in formula
Dear R users, I want to estimate a Cox PH model with time-dependent covariates so I am using a counting process format with the following formula: Surv(data$start, data$stop, data$event.time) ~ cluster(data$id) + G1 + G2 + G3 + G4 + G5 +G6 Gs represent a B-spline basis functions so they sum to 1 and I can't estimate the model as is without getting the last coefficient to be NA, which makes sense given the perfect collinearity. without getting in lengthy details about my code, let me just say that to avoid the colinearity problem,. I do not want to omit G1 from the regression. Instead, I want to fix the regression coefficient of one of the regressors, G1, to 1. I have read the R manual section on formulae but I have not found how to do fix a regression coefficient. Conceptually speaking it seems to me that it should be simple, and I am sure that someone explained it somewhere, but I did not find the proper keywords to find it! So, does someone know how to fix the coefficient of a regressor in the formula for Cox model so that the coefficient is not estimated but still taken into account? Thank you in advance, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assigning and saving datasets in a loop, with names changing with i
Dear R users, I am analysing a very large data set and I need to perform several data manipulations. The dataset is so big that the only way I can play with it without having memory problems (E.g. cannot allocate vectors of size...) is to write a batch script to: 1. cut the data into pieces 2. save the pieces in seperate .RData files 3. Remove everything from the environment 4. load one of the piece 5. perform the manipulations on it 6. save it and remove it from the environment 7. Redo 4-6 for every piece 8. Merge everything together at the end It works if coded line by line but since I'll have to perform these tasks on other data sets, I am trying to automate this as much as I can. I am using a loop in which I used 'assign' and 'get' (pseudo code below). My problem is when I use 'get', it prints the whole object on the screen. I am wondering whether there is a more efficient way to do what I need to do. Any help would be appreciated. Please keep in mind that the whole process is quite computer-intensive, so I can't keep everything in the environment while R performs calculations. Say I have 1 big dataframe called data. I use 'split' to divide it into a list of 12 dataframes (call this list my.list) my.fun is a function that takes a dataframe, performs several manipulations on it and returns a dataframe. for (i in 1:12){ assign( paste( data, i, sep=), my.fun(my.list[i])) # this works # now I need to save this new object as a RData. # The following line does not work save(paste(data, i, sep = ), file = paste( paste(data, i, sep = ), RData, sep=.)) } # This works but it is a bit convoluted!!! temp - get(paste(data, i, sep = )) save(temp, file = lala.RData) } I am *sure* there is something more clever to do but I can't find it. Any help would be appreciated. best regards, MP __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Identifying and characterizing strings of NA in a vector
Dear R users, I was wondering if someone could suggest a few lines of code for my problem. I want to count the number and the length of strings of NA in a vector. For example: vec - c(1, 2, 1, NA, NA, 1, 2, NA, NA, NA, 3, 4, NA, NA) has 2 strings of NA's of length 2 and 1 string of NA' of length 3. I can easily count the number of NA's per vector, but I am having a hard time counting the number and length of strings of NA's per vector without relying heavily on loops. I will have to perform this task for many vectors. Can somebody help? many thanks, Marie-Pierre __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.