Re: [Rd] R CMD check issue with R 3.0.2
On 10/28/2013 06:00 AM, r-devel-requ...@r-project.org wrote: On 13-10-26 9:49 PM, Simon Urbanek wrote: On Oct 25, 2013, at 12:12 PM, Yihui Xie wrote: This has been asked s many times that I think it may be a good idea for R CMD check to just stop when the user passes a directory instead of a tar ball to it, or automatically run R CMD build before moving on. In my opinion, sometimes an FAQ and a bug are not entirely different. +1 -- and I'd do the same for R CMD INSTALL. If someone insists, there could be --force or something like that for those that really want to work on directories despite all the issues with that, but IMHO the default should be for both INSTALL and check to bail out if not presented with a file -- it would save a lot of people a lot of time spent in chasing ghost issues. That seems like a reasonable suggestion. I wouldn't want to lose the ability to install or check a directory; for development of packages like rgl which have a lot of compiled code, installing from a tarball takes a lot longer than installing when all of the code has already been compiled. I use R CMD check on directories often. The survival and coxme pacakges have large test suites, and before things are packaged up for R forge there are may be multiple iterations to get past all of them. (Add one new idea, break something old). Creating the tarball is slow due to vignettes. Thus I would hate to see it outlawed. Of course I know enough to ignore many of the warnings during this testing stage, I do use the tarball for my final run, and I understand the noise level that this option incurs on R-devel. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Multiple return values / bug in rpart?
I don't remember what rpartpl once did myself; as you point out it is a routine that is no longer used and should be removed. I've cc'd Brian since he maintains the rpart code. Long ago return() with multiple arguments was a legal shorthand for returning a list. This feature was depricated in Splus, I think even before R rose to prominence. I vaguely remember a time when it's usage generated a warning. The fact that I've never noticed this unused routine is somewhat embarrassing. Perhaps I need a not documented, never called addition to R CMD check to help me along. Terry Therneau In the recommended package rpart (version 4.1-1), the file rpartpl.R contains the following line: return(x = x[!erase], y = y[!erase]) AFAIK, returning multiple values like this is not valid R. Is that correct? I can't seem to make it work in my own code. It doesn't appear that rpartpl.R is used anywhere, so this may have never caused an issue. But it's tripping up my R compiler. Thanks, Justin Talbot __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Problem with anova and the new abbreviation restrictions
An unwanted side effect of the new restrictions on abrreviated names. The anova.coxph command, in a slavish copy of anova.lm etc, returns a data frame with column labels of loglik Chisq Df Pr(|Chi|) If one tries to extract the final column of the table errors result since it is not a standard R variable name. afit - anova(lm(conc ~ uptake, CO2)) afit$P [1] 2.905607e-06 NA Warning message: In `$.data.frame`(afit, P) : Name partially matched in data frame afit$Pr(F) Error: unexpected '' in afit$Pr( The second is a result of allowing TAB completion of the name. Yes, experienced folks can sneak around it, but should this be upgraded to match and ease of use criteria? Add an [.anova subscript method that allows for shortened names? I'm still in favor of an option() that controls this new behavior with values of allow, warn, and error so that I can at least turn it off for me. (I work on many data sets created by SAS programmers that are in love with excessively long names, and use batch scripts so name completion isn't avail during script creation). But this is more local to me, and does not address the primary question above in a general way. Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Calling an array in a struct in C to R
Another solution is the one used for a long time in the rpart code. The R code called rpart1, which does the work, keeps a static pointer to the object, does NOT release it's memory, and returned the size of the object. Then the R code allocates appropriate vectors and called rpart2, which filled in the result and released the dynamic memory. This was written before .Call was available. It works, but I agree with Bill Dunlap that you should use .Call. Terry T. On 06/20/2013 05:00 AM, r-devel-requ...@r-project.org wrote: Hi there, Although I'm a quite experienced R user and know my ways in C, I stumbled upon a problem I don't know how to solve. Therefore, I hope someone can provide me with the information or pointers I need in order to understand the way in which the communication between R and C occurs. I have the following C code which basicallly reflects what I want: typedef struct { float *array; size_t used; size_t size; } Array; void main2R() { Array a; examplefunction(a); /*fills and dynamically grows a-array*/ } Now I would want to return a.array or a-array to R. According to the R manuals, the compiled C code should not return anything except through its arguments. The problem here is, I have a dynamically growing array, and I have no idea what to pass on to C from R in order to let that work. The word 'should' indicates that the compiled code may return something in special circumstances, but I have no idea how to get a.array in R. So my question is simply this: How on earth do I get the information in the float array 'a.array' in R? Is it even possible or should I rewrite my C code using Call in R? Another, not preferred, options is to pre-allocate the array/vector in R on a fixed (large-enough) size? Or do I miss something here? Regards. Tee-Jay-Ardie [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] loading of unwanted namespace
Martin, Your suggestion below did the trick. The issue was obvious once this pointed me to the correct bit of code. Thanks much. Terry T. begin included text --- trace(loadNamespace, quote(if (package == survival) recover())) will break into ?recover when survival is being loaded. Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Patch proposal for R style consistency (concerning deparse.c)
I'll be the anybody to argue that } else { is an ugly kludge which you will never find in my source code. Yes, it's necessary at the command line because the parser needs help in guessing when an expression is finished, but is only needed in that case. Since I can hardly imagine using else at the command line (that many correct characters in a row exceeds my typing skill) it's not an issue for me. I most certainly would not inflict this construction on my pupils when teaching a class, nor that any break of a long line has to be after + but not before, nor other crutches for the parser's sake. Let them know about the special case of course, but don't sacrifice good coding style the deficiency. That said, I am completely ambivalent to the result of deparse. Just throwing up an objection to the purity argument: things were beginning to sound a bit too bombastic :-). Terry T. On 05/02/2013 05:00 AM, r-devel-requ...@r-project.org wrote: I want } else {. Yihue wants } else {. And I have not heard anybody say they prefer the other way, unless you interpret Duncan's comment that's nonsense as a blanket defense of the status quo. But I don't think he meant that. This is a matter of style consistency and avoidance of new R-user confusion and error. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] loading of an unwanted namespace
I have a debugging environment for the survival package, perhaps unique to me, but I find it works very well. To wit, a separate directory with copies of the source code but none of the package accuements of DESCRIPTION, NAMESPACE, etc. This separate space does NOT contain a copy of src/init.c Within this I use R --vanilla, attach my .RData file, survival.so file, and away we go. That is, until my first useage of it today under R 3.0. My runs get into trouble with messages about conflicts with the survival namespace. My problem is that I can't figure out where or how the name space is being searched out and attached. Any hints on where to look would be appreciated. This magical load is kind of spooky. Here is a session that displays it. The setup.s file does the dyn.load and attaches some data sets. tmt-local3499% R --vanilla R version 3.0.0 (2013-04-03) -- Masked Marvel Copyright (C) 2013 The R Foundation for Statistical Computing Platform: i686-pc-linux-gnu (32-bit) source('setup.s') sessionInfo() R version 3.0.0 (2013-04-03) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base search() [1] .GlobalEnv file:../.RData file:../data/.RData [4] package:splines package:stats package:graphics [7] package:grDevices package:utils package:datasets coxph(Surv(time, status) ~ age, data=lung) Call: coxph(formula = Surv(time, status) ~ age, data = lung) coef exp(coef) se(coef) z p age 0.0187 1.02 0.0092 2.03 0.042 Likelihood ratio test=4.24 on 1 df, p=0.0395 n= 228, number of events= 165 # That worked fine, but the next fails coxph(Surv(time, status) ~ pspline(age), data=lung) Error in FUN(X[[2L]], ...) : (converted from warning) failed to assign RegisteredNativeSymbol for Cagfit5a to Cagfit5a since Cagfit5a is already defined in the ‘survival’ namespace sessionInfo() R version 3.0.0 (2013-04-03) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base [10] package:methods Autoloads package:base options(warn=2) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Loss of identified points
Minimizing and then restoring a window that contains identified points leads to an error on my linux box. sessionInfo() R version 3.0.0 (2013-04-03) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base sessionInfo() R version 3.0.0 (2013-04-03) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base plot(1:5, 1:5) identify(1:5, 1:5, letters[2:6]) # Identify a point or two, then minimize the graph window, then restore it Error: first argument must be a string (of length 1) or native symbol reference The graph returns but the labels are lost. Terry T. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Timing of SET_VECTOR_ELT
Assume a C program invoked by .Call, that returns a list. Near the top of the program we allocate space for all the list elements. (It is my habit to use xyz2 for the name of the R object and xyz for the pointer to its contents.) PROTECT(means2 = allocVector(REALSXP, nvar)); means = REAL(means2); PROTECT(u2 = allocVector(REALSXP, nvar)); u = REAL(u2); PROTECT(loglik2 = allocVector(REALSXP, 2)); loglik = REAL(loglik2); PROTECT(rlist = mknamed(VECSXP, outnames)); Can I assign the individual elements into rlist using SET_VECTOR_ELT at this point, or do I need to wait until the end of the program, after I've filled in means[i], u[i], etc.? I likely depends on whether I'm assigning a pointer or a copy. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Timing of SET_VECTOR_ELT
On 04/01/2013 12:44 PM, Simon Urbanek wrote: On Apr 1, 2013, at 1:10 PM, Terry Therneau wrote: Assume a C program invoked by .Call, that returns a list. Near the top of the program we allocate space for all the list elements. (It is my habit to use xyz2 for the name of the R object and xyz for the pointer to its contents.) PROTECT(means2 = allocVector(REALSXP, nvar)); means = REAL(means2); PROTECT(u2 = allocVector(REALSXP, nvar)); u = REAL(u2); PROTECT(loglik2 = allocVector(REALSXP, 2)); loglik = REAL(loglik2); PROTECT(rlist = mknamed(VECSXP, outnames)); Can I assign the individual elements into rlist using SET_VECTOR_ELT at this point, or do I need to wait until the end of the program, after I've filled in means[i], u[i], etc.? I likely depends on whether I'm assigning a pointer or a copy. You're assigning a pointer, so it doesn't matter. FWIW, you can avoid all the PROTECTion mess if you alloc+assign, e.g. SEXP rlist = PROTECT(mknamed(VECSXP, outnames)); SEXP means = SET_VECTOR_ELT(rlist, 0, allocVector(REALSXP, nvar)); ... since you only need to protect the parent object. Cheers, Simon Neat trick. I take it that SET_VECTOR_ELT returns a pointer to the object that was just created? I haven't found a description of the function in any of the documents, only examples of its use, so this is a surprise to me. Lacking documentation, can I count on it in the future? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] ess completion
The thread is strange to me as well, since completion is logically impossible for my Sweave files. - an emacs buffer is open working on an .Rnw file - there is no copy of R running anywhere on the machine - the code chunk in the .Rnw file refers to a R data object saved somewhere else Of course it cannot do name completion on a bit of code I'm writing, lacking omniscient powers. Emacs is good but not that good :-) The ESS manual section 12.1 states that for .R files it has completion of object names and file names. I'm puzzled by exactly what this means, since it is logically impossible (without a running R session) to do this in general. Linking an .Rnw file to an inferior R process doesn't make sense to me. However, I think it's time to move this sub-thread off of R-devel. Respond to me privately about the answer or the proper list for this discussion. Terry T On 03/22/2013 06:00 AM, r-devel-requ...@r-project.org wrote: This thread is strange for me to read as I've been getting completion of object names, function arguments names, and whatnot in ESS buffers for as long as I can have been using it. And I'm only on ESS 12.09. Perhaps you need to set `ess-use-R-completion` to non-nil. Or check the value of `completion-at-point-functions`. Mine is '(ess-roxy-tag-completion ess-filename-completion ess-object-completion t) Peter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 121, Issue 20
I am not in favor of the change, which is a choice of rigor over usability. When I am developing code or functions I agree with this, and I view any warnings from R CMD check about shortened arguments as positive feedback. But 90% of my usage of R is day to day data analysis, interactive, at the keyboard. A lot of data sets that come to me have long variable names. What this change will mean to me is that I'll either spend a lot of time cursing at these annoying frickin warnings, or have to take the time to make new data sets (several times a week) that have abbreviated names. Of course when I come back to that project 8 weeks later I'll need to recreate the short-long mapping in my head... Let's think about WHY abbreviated names were allowed in the first place. Usability is worth a lot more than purity to the actual users of a package. S had the rare advantage of serious users just down the hall from the developers, which I hypothesise is one of the true foundation for it's success. (What user would have invented the hiding aspect S4 classes --- let's put the results of the fit into a steel box so they can't see the parts --- which is actually touted as a virtue? I cringe every time a new method I need is sewed up this way.) C is a remarkable language. ... The success of C is due to a number of factors, none of them key, but all of them important. Perhaps the most significant of all is that C was developed by real practioners of programming and was designed for practical day-to-day use, not for show or for demonstration. Like any well-designed tool, it falls easily to the hand and feels good to use. Instead of providing constraints, checks and rigorous boundaries, it concentrates on providing you with power and on not getting in your way. Preface to The C Book, M Banahan et al Terry Therneau On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote: Allowing partial matching on $-extraction has always been a source of accidents. Recently, someone who shall remain nameless tried names(mydata) - d^2 followed by mydata$d^2. As variables in a data frame are generally considered similar to variables in, say, the global environment, it seems strange that foo$bar can give you the content of foo$bartender. In R-devel (i.e., *not* R-3.0.0 beta, but 3.1.0-to-be) partial matches now gives a warning. Of course, it is inevitable that lazy programmers will have been using code like [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Re Deprecating partial matching in $.data.frame
As a follow-up to my previous, let me make a concrete suggestion: Add this as one of the options df-partial-match = allowed, warn, fail Set the default to warn for the current R-dev, and migrate it to fail at a later date of your choosing. I expect that this is very little more work for the development team, it provides an extended grace period to those running old code that would depend on parial matching (I sometimes have to recreate a 4-5 year old analysis.), it will let those who strongly agree migrate to fail immediately, and save you from the flames of those who disagree. Some things that look good at first are not: examples from the past are na.fail as the default for missing values (with no global option to override) and strings as factors (with no option to override, global or otherwise). There are other options that settle into their default and never change. It's tough to make predictions, especially about the future-- Yogi Berra. On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote: As variables in a data frame are generally considered similar to variables in, say, the global environment, it seems strange that foo$bar can give you the content of foo$bartender. In R-devel (i.e.,*not* R-3.0.0 beta, but 3.1.0-to-be) partial matches now gives a warning. Of course, it is inevitable that lazy programmers will have been using code like [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Depreciating partial matching
Note: My apolgies for the Subject in the original post On 03/21/2013 08:59 AM, Milan Bouchet-Valat wrote: Le jeudi 21 mars 2013 à 08:51 -0500, Terry Therneau a écrit : I am not in favor of the change, which is a choice of rigor over usability. When I am developing code or functions I agree with this, and I view any warnings from R CMD check about shortened arguments as positive feedback. But 90% of my usage of R is day to day data analysis, interactive, at the keyboard. A lot of data sets that come to me have long variable names. What this change will mean to me is that I'll either spend a lot of time cursing at these annoying frickin warnings, or have to take the time to make new data sets (several times a week) that have abbreviated names. Of course when I come back to that project 8 weeks later I'll need to recreate the short-long mapping in my head... Let's think about WHY abbreviated names were allowed in the first place. Usability is worth a lot more than purity to the actual users of a package. S had the rare advantage of serious users just down the hall from the developers, which I hypothesise is one of the true foundation for it's success. (What user would have invented the hiding aspect S4 classes --- let's put the results of the fit into a steel box so they can't see the parts --- which is actually touted as a virtue? I cringe every time a new method I need is sewed up this way.) C is a remarkable language. ... The success of C is due to a number of factors, none of them key, but all of them important. Perhaps the most significant of all is that C was developed by real practioners of programming and was designed for practical day-to-day use, not for show or for demonstration. Like any well-designed tool, it falls easily to the hand and feels good to use. Instead of providing constraints, checks and rigorous boundaries, it concentrates on providing you with power and on not getting in your way. Preface to The C Book, M Banahan et al I would think that the ability to hit the Tab key to trigger name completion in your R GUI makes partial matching almost useless. The avantage of interactive completion in the GUI is that you immediately see the result of the partial matching. So you get the best of both worlds: no need to type long variable names in full, but no traps when a match is not what you would expect. Doesn't this suit your use case? Good point. This works well at the command line. However, not when interacting between emacs and R in the way I do. For reproducability I use and emacs file that is being corrected and massaged with chunks submitted to R; at the end I have a clean record of how the result was obtained. Regards Terry Therneau On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote: Allowing partial matching on $-extraction has always been a source of accidents. Recently, someone who shall remain nameless tried names(mydata)- d^2 followed by mydata$d^2. As variables in a data frame are generally considered similar to variables in, say, the global environment, it seems strange that foo$bar can give you the content of foo$bartender. In R-devel (i.e., *not* R-3.0.0 beta, but 3.1.0-to-be) partial matches now gives a warning. Of course, it is inevitable that lazy programmers will have been using code like [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Depreciating partial matching
On 03/21/2013 10:00 AM, Simon Urbanek wrote: I would think that the ability to hit the Tab key to trigger name completion in your R GUI makes partial matching almost useless. The avantage of interactive completion in the GUI is that you immediately see the result of the partial matching. So you get the best of both worlds: no need to type long variable names in full, but no traps when a match is not what you would expect. Doesn't this suit your use case? Good point. This works well at the command line. However, not when interacting between emacs and R in the way I do. For reproducability I use and emacs file that is being corrected and massaged with chunks submitted to R; at the end I have a clean record of how the result was obtained. If this is really true (that ESS doesn't complete in R files) then this seems more like a bug (or wish?) report for ESS - other editors correctly support code completion in R documents - after all this is a feature of R, so they don't need to re-invent the wheel. Cheers, Simon If you are running the R process inside ESS then there is matching -- it is R. Doing this, keeping a log file, and then post-hoc cleaning up all the cruft from that file is one way to keep documentation. But since for my analyses the number of models/plots/etc that turn out to be detours or dead ends on the way to a solution is larger than the worthwhile part (typos alone are lots larger) I prefer to keep the file(s) as their own buffers and submit bits of them to an R process either by cut-paste to a separate window or ess-submit to an inferior process. Emacs can't do name completion in that case. Nor could it do so in an Sweave file, unless you were to keep a live R process in hand to pre-test chunks as you wrote them. (One could reasonably argue that when one gets the Sweave stage the names should be expanded.) To summarize: my own interactive mix of emacs/R may be unusual. For pure interactive folks completion does most of the work. I hadn't tried the newest ESS interactive-within-emacs till today, it's slick as well. The number of people howling will be less than my original thought, though not zero. Still, this change could cause a lot of grief for saved R scripts. In our group the code + data directory is archived whenever a medical paper is submitted (close to 500/year), and it is very common to pull one back as is 1-4 years later for further exploration. A very small subset of those are in a legal context where exact reproducability is paramount. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] numerics from a factor
Thanks Martin. I had already changed the second argument to getOption. By the way, per an off list comment from Brian R the bug I was addressing won't affect anyone using R as shipped; the default decimal separator is . whatever the region. It only bit those who set the OutDec option themselves. Terry T. On 03/19/2013 11:30 AM, Martin Maechler wrote: Hi Terry, you can use type.convert instead of as.numeric for numbers with decimals: type.convert(levels(factor(1:6/2)), dec=unlist(options(OutDec))) Best, Ulrike a late and minor remark: If you use the above, a slightly more expressive preferred way is type.convert(levels(factor(1:6/2)), dec = getOption(OutDec)) Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] numerics from a factor
A problem has been pointed out by a French user of the survival package and I'm looking for a pointer. options(OutDec= ,) fit - survfit(Surv(1:6 /2) ~ 1) fit$time [1] NA 1 NA 2 NA 3 A year or two ago some test cases that broke survfit were presented to me. The heart of the problem was numbers that were almost identical, where table(x) and unique(x) gave different counts of distinct values. The solution was to use ftime - factor(time) at the top of the code, and do all the calulations using the integer levels of the factor as the unique time points. At the very end the numeric component time of the result is created using as.numeric(levels(ftime)). It's this last line that breaks. I could set the OutDec option within survfit and reset when I leave using on.exit. Any other simple solutions? Any other ways I could get caught by this issue? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 121, Issue 13
I think it would be a good idea. Several versions of the survival package had a duplicate line in the S3methods, and were missing a line that should have been there, due to a cut/paste error. Terry T. On 03/13/2013 06:00 AM, r-devel-requ...@r-project.org wrote: Circa 80 CRAN and core-R packages have duplicate export entries in their NAMESPACE files. E.g., bit 1.1.9 : c(as.bit, as.bitwhich, as.which, physical, virtual) forecast 4.1 : forecast.lm graphics 2.15.3 : barplot mcmc 0.9.1 : morph RCurl 1.95.3 : curlOptions utils 2.15.3 : RweaveLatexOptions Would it be helpful for 'check' to alert package writers to this? I made the list using f(): f- function () { for(pkg in installed.packages()[,Package]) { try( { exports- parseNamespaceFile(pkg, R.home(library))$exports if (any(dup- duplicated(exports))) { cat(pkg, format(packageVersion(pkg)), :, deparse(exports[dup]), \n) } }, silent = TRUE) } } I suppose it should also check for duplicates in S3method component, etc. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Registering native routines
That was the correct direction: I changed the earler line to routines - list(Ccoxfit5a, ... and the the later to .C(routnines[[1]]) and now it works as desired. Terry T. On 02/23/2013 03:09 AM, Duncan Murdoch wrote: On 13-02-22 2:59 PM, Terry Therneau wrote: I'm working on registering all the routines in the survival package, per a request from R-core. Two questions: 1. In the coxph routine I have this type of structure: if (survival has 2 columns) routines - c(coxfit5_a, coxfit5_b, coxfit5_c) else routines - c(agfit5_a, agfit5_b, agfit5_c) . .C(routines[1], arg1, etc I tried replacing routines with a vector of native symbol references, but it doesn't work Error in .C(routines[1], as.integer(n), as.integer(nvar), as.double(y), : first argument must be a string (of length 1) or native symbol reference I imagine routines is a list in this case, so you should be using routines[[1]] to extract the element, rather than subsetting the list. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Registering native routines
I'm working on registering all the routines in the survival package, per a request from R-core. Two questions: 1. In the coxph routine I have this type of structure: if (survival has 2 columns) routines - c(coxfit5_a, coxfit5_b, coxfit5_c) else routines - c(agfit5_a, agfit5_b, agfit5_c) . .C(routines[1], arg1, etc I tried replacing routines with a vector of native symbol references, but it doesn't work Error in .C(routines[1], as.integer(n), as.integer(nvar), as.double(y), : first argument must be a string (of length 1) or native symbol reference I had fixed up all the other .C and .Call statements first (28 of them) and that worked, so the problem is not with finding the set of symbol references. 2. In the R-exts manual it mentions another argument style for C calls to specify if an argument is for input, output, or both. However, I can find no details on how to use it. 3. A few of my routines still had a COPY argument. I assume that is simply ignored? Terry T. R Under development (unstable) (2013-02-11 r61902) Platform: i686-pc-linux-gnu (32-bit) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Small suggestion for termplot
Brian, I used termplot(..., plot=FALSE) recently in R-devel: works like a charm. Thanks much for the update. Our in-house gamterms function, which this obviates, would also return the constant attribute from the underlying predict(..., type=terms) call. I have occasionally found this useful, and so it would be a worthwhile addition to termplot. Currently fit - coxph(Surv(time, status) ~ pspline(age) + sex + ns(wt.loss), data=lung) zed - termplot(fit, se=TRUE, plot=FALSE) returns a list with components zed$age, zed$sex, zed$wt.loss. The constant could be added as another element of the list or as an attribute, I don't have an opinion either way so have not suggested a patch. You may have a reason for preferring one or the other. Clearly this is not critical for version 3.0 release. I sent this to you since you impliemented the plot=FALSE change, cc to the list in case someone else is appropriate. For those on the list, the recent change has three nice features: a. Use of predict(.., type='terms') is a nuisance because the result is in data set order rather than x order, a lines() call becomes a scribble. This has reduced each term to the set of unique x values, in order, just what you need for a plot. b. In the coxph example above I use plot(zed$age$x, exp(zed$age$y), log='y') to get a better y-axis label. For all the developers, this is a nice way to deflect requests for yet-another-plotting-option addition to termplot. c. Easy to overlay results from two separate fits. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors
I think your idea to remove the warnings is excellent, and a good compromise. Characters already work fine in modeling functions except for the silly warning. It is interesting how often the defaults for a program reflect the data sets in use at the time the defaults were chosen. There are some such in my own survival package whose proper value is no longer as obvious as it was when I chose them. Factors are very handy for variables which have only a few levels and will be used in modeling. Every character variable of every dataset in Statistical Models in S, which introduced factors, is of this type so auto-transformation made a lot of sense. The solder data set there is one for which Helmert contrasts are proper so guess what the default contrast option was? (I think there are only a few data sets in the world for which Helmert makes sense, however, and R eventually changed the default.) For character variables that should not be factors such as a street adress stringsAsFactors can be a real PITA, and I expect that people's preference for the option depends almost entirely on how often these arise in their own work. As long as there is an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the default, partly because the current value is a tripwire in the hallway that eventually catches every new user. Terry Therneau On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote: Both of these were discussed by R Core. I think it's unlikely the default for stringsAsFactors will be changed (some R Core members like the current behaviour), but it's fairly likely the show.signif.stars default will change. (That's if someone gets around to it: I personally don't care about that one. P-values are commonly used statistics, and the stars are just a simple graphical display of them. I find some p-values to be useful, and the display to be harmless.) I think it's really unlikely the more extreme changes (i.e. dropping show.signif.stars completely, or dropping p-values) will happen. Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] stringsAsFactors
The root of this problem is that the .getXlevels function does not return the levels for character variables. Future predictions depend on that information. On 02/11/2013 11:50 AM, Duncan Murdoch wrote: On 11/02/2013 12:13 PM, William Dunlap wrote: Note that changing this does not just mean getting rid of silly warnings. Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE. d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4, 15:17, 28.1,28.8,30.1)) fit_ab - lm(y ~ x + f, data = d, subset = f!=B) Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'f' converted to a factor predict(fit_ab, newdata=d) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 25 26 27 8 9 10 Warning messages: 1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) : variable 'f' converted to a factor 2: In predict.lm(fit_ab, newdata = d) : prediction from a rank-deficient fit may be misleading fit_ab is not rank-deficient and the predict should report 1 2 3 4 NA NA NA 28 29 30 In R-devel, the two warnings about factor conversions are no longer given, but the predictions are the same and the warning about rank deficiency still shows up. If f is set to be a factor, an error is generated: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor f has new levels B I think both the warning and error are somewhat reasonable responses. The fit is rank deficient relative to the model that includes f == B, because the column of the design matrix corresponding to f level B would be completely zero. In this particular model, we could still do predictions for the other levels, but it also seems reasonable to quit, given that clearly something has gone wrong. I do think that it's unfortunate that we don't get the same result in both cases, and I'd like to have gotten the predictions you suggested, but I don't think that's going to happen. The reason for the difference is that the subsetting is done before the conversion to a factor, but I think that is unavoidable without really big changes. Duncan Murdoch Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Terry Therneau Sent: Monday, February 11, 2013 5:50 AM To: r-devel@r-project.org; Duncan Murdoch Subject: Re: [Rd] stringsAsFactors I think your idea to remove the warnings is excellent, and a good compromise. Characters already work fine in modeling functions except for the silly warning. It is interesting how often the defaults for a program reflect the data sets in use at the time the defaults were chosen. There are some such in my own survival package whose proper value is no longer as obvious as it was when I chose them. Factors are very handy for variables which have only a few levels and will be used in modeling. Every character variable of every dataset in Statistical Models in S, which introduced factors, is of this type so auto-transformation made a lot of sense. The solder data set there is one for which Helmert contrasts are proper so guess what the default contrast option was? (I think there are only a few data sets in the world for which Helmert makes sense, however, and R eventually changed the default.) For character variables that should not be factors such as a street adress stringsAsFactors can be a real PITA, and I expect that people's preference for the option depends almost entirely on how often these arise in their own work. As long as there is an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the default, partly because the current value is a tripwire in the hallway that eventually catches every new user. Terry Therneau On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote: Both of these were discussed by R Core. I think it's unlikely the default for stringsAsFactors will be changed (some R Core members like the current behaviour), but it's fairly likely the show.signif.stars default will change. (That's if someone gets around to it: I personally don't care about that one. P-values are commonly used statistics, and the stars are just a simple graphical display of them. I find some p-values to be useful, and the display to be harmless.) I think it's really unlikely the more extreme changes (i.e. dropping show.signif.stars completely, or dropping p-values) will happen. Regarding stringsAsFactors: I'm not going to defend keeping it as is, I'll let the people who like it defend it. What I will likely do is make a few changes so that character vectors are automatically changed to factors in modelling functions, so that operating with stringsAsFactors=FALSE doesn't trigger silly warnings. __ R-devel@r-project.org
Re: [Rd] stringsAsFactors
Peter, I had an earlier response to Duncan that I should have copied to the list. The subset issue can be fixed. When the model changes character to factor, it needs to remember the levels; just like it does with the factors. We are simply seeing a reprise of problems that occured whem models didn't remember factor levels -- I've been down this road before. Lot's of ideas and work arounds were tried, none of which worked until that memory was added ($xlevels in lm, glm, coxph, etc fits). Can everything be fixed, in the sense that R always makes the right choices for my data? I seriously doubt it. As to stringsAsFactors -- the right answer is the one that causes each person the least bother. For me that is stringsAsFactors = some, which means that I turn it off and build the ones I need. The right global default, likely, is whichever one that causes members of R Core the least bother :-) On 02/11/2013 04:46 PM, peter dalgaard wrote: It's logically impossible I'd say. If you want to do conversion from character to factor on an as-needed basis, you_will_ have issues with subsetting operations affecting the set of levels. The logical way out is to define factors before subsetting. As far as possible, create them up front. Doing it automagically in read.table is far from infallible, but at least has some chance of getting in roughly right. In my view, this is actually a pretty strong argument for keeping stringsAsFactors==TRUE. ( __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] regression stars
There are only a few things in R where we override the global defaults on a departmental level -- we really don't like to do so. But show.signif.stars is one of the 3. The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by default in the output of table. We've been overriding both of these for 10+ years. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Why can't I unclass an array?
In a real example I was trying to remove the class from the result of table, just because it was to be used as a building block for other things and a simple integer vector seemed likely to be most efficient. I'm puzzled as to why unclass doesn't work. zed - table(1:5) class(zed) [1] table class(unclass(zed)) [1] array class(unclass(unclass(unclass(zed [1] array class(as.vector(zed)) [1] integer sessionInfo() R Under development (unstable) (2012-11-28 r61176) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] CMD check and .Rbuildignore
I often run R CMD check on my package source. I've noted of late that this process gives warning messages about files listed in my .Rbuildignore file, where is once ignored these. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] as.matrix.Surv -- R core question/opinions
1. A Surv object is a matrix with some extra attributes. The as.matrix.Surv function removes the extras but otherwise leaves it as is. 2. The last several versions of the survival library were accidentally missing the S3method('as.matrix', 'Surv') line from their NAMESPACE file. (Instead it's position is held by a duplicate of the line just above it in the NAMESPACE file, suggesting a copy/paste error). As a consequence the as.matrix.Surv function was effectively ignored, and the default method was being used. The as.matrix.default function leaves anything with a dim attribute alone. 3. In my current about-to-submit-to-CRAN version of survival the missing NAMESPACE line was restored. This breaks one function in one package (rms) which calls as.matrix(y) on a Surv object but then later looks at the type attribute of y. So now to the design question: should the as.matrix.Surv function sanitize the result by removing the extra attributes, or should it leave them alone? The first seems cleaner; my accidental multi-year test of leaving them in, however, clearly shows that it does no harm. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Baffled with as.matrix
I'm puzzled by as.matrix. It appears to work differently for Surv objects. Here is a session from my computer: tmt% R --vanilla library(survival) Loading required package: splines ytest - Surv(1:3, c(1,0,1)) is.matrix(ytest) [1] TRUE attr(ytest, 'type') [1] right attr(as.matrix(ytest), 'type') [1] right y2 - ytest class(y2) - charlie as.matrix.charlie - survival:::as.matrix.Surv attr(y2, 'type') [1] right attr(as.matrix(y2), 'type') NULL survival:::as.matrix.Surv function (x) { y - unclass(x) attr(y, type) - NULL y } bytecode: 0x91c1610 environment: namespace:survival It appears that Surv objects are being processed by as.matrix.default, but charlie objects by the actual method. One more verification: attr(survival:::as.matrix.Surv(ytest), 'type') NULL attr(as.matrix.default(y2), 'type') [1] right Context: In testing the next survival release (2.37), it has lost this special behavior. One package that depends on survival expects this behavior and thus fails. I'm at a loss to figure out how my package got this attribute in the first place, or how it lost it. Can anyone shed light? Terry Therneau PS I'm on vacation for the next few days so will be intermittent with email. (Off to see my first grandchild!) - sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.36-14 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Puzzling behavior while testing
I'm testing a new release of survival, executing the following piece of code: for (testpkg in survdep) { z - testInstalledPackage(testpkg, outDir=tests) cat(testpkg, c(Ok, Failed)[z+1], \n, file=progress, append=T) } The vector survdep contains the names of all 156 packages listed as reverse depends on the CRAN page for survival. All works well except for the packages coin and compareGroups. If we make the survdep list contain only those 2 then the last line above cat... fails with testpkg not found. Running ls() at that point shows a set of variables I didn't define, and no evidence of any of testpkg, survdep, or anything else I'd defined. My prompt has been changed to R as well. Any idea what is happening? Terry Therneau Here is the text just before starting the loop sessionInfo() R version 2.15.2 (2012-10-26) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Addition of plot=F argument to termplot
I have a suggested addition to termplot. We have a local mod that is used whenever none of the termplot options is quite right. It is used here almost daily for Cox models in order to put the y axis on a risk scale: fit - coxph(Surv(time, status) ~ ph.ecog + pspline(age), data=lung) zz - termplot(fit, se=TRUE, plot=FALSE) yy - zz$age$y + outer(zz$age$se, c(0, -2, 2), '*') matplot(zz$age$x, exp(yy), log='y', type='l', lty=c(1,2,2), xlab=Age, ylab=Relative risk) The return value is a list, each element of which is a dataframe. Another use is to overlay the fits from two different models. The changes to termplot.R and .Rd are small. Since we're not supposed to add attachments to these emails, where to I send the updated files? Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fastest non-overlapping binning mean function out there?
Look at rowsum. It's pretty fast C code. Terry T On 10/03/2012 05:00 AM, r-devel-requ...@r-project.org wrote: Hi, I'm looking for a super-duper fast mean/sum binning implementation available in R, and before implementing z = binnedMeans(x y) in native code myself, does any one know of an existing function/package for this? I'm sure it already exists. So, given data (x,y) and B bins bx[1] bx[2] ... bx[B] bx[B+1], I'd like to calculate the binned means (or sums) 'z' such that z[1] = mean(x[bx[1]= x x bx[2]]), z[2] = mean(x[bx[2]= x x bx[3]]), z[B]. Let's assume there are no missing values and 'x' and 'bx' is already ordered. The length of 'x' is in the order of 10,000-millions. The number of elements in each bin vary. Thanks, Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rbuildignore question
I'm touching up changes to rpart and have a question with .Rbuildignore. Here is my file tmt1014% more .Rbuildignore test.local \.hg src/print_tree.c The source code included a module print_tree.c, used for dubugging. Commented out calls to can be found here and there. I want to leave it in the source tree even though no submitted copy of rpart will use it. Even with the ignore line above, R CMD check still compiles it, and gives a bad boy NOTE about use of printf. Can I/ should I/ how do I get rid of this? (R 2.15.1) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rbuildignore question
Thanks, that answers my question. 1. I was running on the source tree. And by leave in the source tree I mean just that, with local source code control managment. It's in a place I can find it when needed. It would appear on Rforge. But per your comment, not on the CRAN source. Ok 2. I'll check the tarball soon, but I'm guessing you are right about the error going away. On 09/20/2012 12:57 PM, Duncan Murdoch wrote: On 20/09/2012 1:43 PM, Terry Therneau wrote: I'm touching up changes to rpart and have a question with .Rbuildignore. Here is my file tmt1014% more .Rbuildignore test.local \.hg src/print_tree.c The source code included a module print_tree.c, used for dubugging. Commented out calls to can be found here and there. I want to leave it in the source tree even though no submitted copy of rpart will use it. Even with the ignore line above, R CMD check still compiles it, and gives a bad boy NOTE about use of printf. Can I/ should I/ how do I get rid of this? What do you mean, leave it in the source tree? Since you're telling build to ignore it, I assume that's just for your own use, not for users of your package. And what did you run check on, the tarball or the directory? If you ran it on the tarball, then there's something wrong with your tarball, because it shouldn't be there (you said to ignore it). If you're running check on the directory, then ignore the NOTE, because it shouldn't appear when you run it on the tarball. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 115, Issue 18
In general, as a package user, I don't want people to be able to suppress checks on CRAN. I want things fixed. So I am pretty sure there won't ever be a reliable CRAN-detector put into R. It would devalue the brand. Duncan Murdoch My problem is that CRAN demands that I suppress a large fraction of my checks, in order to fit within time constraints. This leaves me with 3 choices. 1. Add lines to my code that tries to guess if CRAN is invoker. A cat and mouse game per your desire above. 2. Remove large portions of my test suite. I consider the survival package to be one of the pre-eminent current code sets in the world precisely because of the extensive validations, this action would change it to a second class citizen. 3. Add a magic environment variable to my local world, only do the full tests if it is present, and make the dumbed down version the default. Others who want to run the full set are then SOL, which I very much don't like. I agree that CRAN avoidence, other than the time constraint, should be verboten. But I don't think that security through obscurity is the answer. And note that under scenario 3, which is essentially what is currently being forced on us, I can do such micshief as easily as under number 1. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 115, Issue 18
On 09/19/2012 09:22 AM, Duncan Murdoch wrote: I understand the issue with time constraints on checks, and I think there are discussions in progress about that. My suggestion has been to put in place a way for a tester to say that checks need to be run within a tight time limit, and CRAN as tester will do that in cases where it cares about the timing. Sounds good. But even with the current (or past, it may have changed already) behaviour of tight limits for CRAN testing, you can put in code in your package that allows for longer tests in certain conditions. You'll run them, and if you advertise them, others will run them too. For me this applies to certain vignettes. Thanks to a pointer from K Hansen, I'm told I can handle this by putting the long one in inst/doc and the short ones in vignettes. I'll be trying this out soon. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Cran package checks
Some questions motivated by this discussion. From the CRAN policy page: Checking the package should take as little CPU time as possible, as the CRAN check farm is a very limited resource and there are thousands of packages. Long-running tests and vignette code can be made optional for checking, but do ensure that the checks that are left do exercise all the features of the package. Is there a further document that elucidates more on this? What is little CPU time. Is there a documented variable that I can check and then reduce the test set for CRAN? Duncan mentioned one in the dicussion, but I'll end up forgetting the details. A static reference would help. Test all the features. The test directory for survival has 75 files and still doesn't test everything: I'm about to add a new one today in response to a bug report. How do I adjucate between little time and test all? Footnote: the manual page for R CMD check (?check) has the line Many of the checks in R CMD check can be turned off or on by environment variables: see Chapter 6 of the ‘R Internals’ manual. The reference should be chapter 7. That chapter does document the use of _R_CHECK_TIMINGS_=10. But it seems dangerous to use this as an indirect test, ie. if timings is 10 then this must be CRAN running. My biggest time offender is in the vignettes. After multiple readings of the docs I still can't quite figure out how to specify - pdf files that should be in the vignettes index, but are not .Rnw source - how to tell R which ones to redo, and which to just accept the pdf - per the policy line all source for pdf should be available, non-inclusion of the Rnw file isn't an answer. Again, is there another document I'm missing? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] if(--as-cran)?
On 09/04/2012 05:00 AM, r-devel-requ...@r-project.org wrote: The issue is not just about CRAN vs off CRAN. It is good to think about a more general scheme of light testing vs normal testing vs extensive testing, e.g., for the situation where the package implements (simulation/bootstrap/ ..) based inference, and the developer (but not only) should be able to run the extensive tests. Martin I agree with Martin. A mechanism to specify testing level would be the best. Then CRAN can choose to set that variable to 3 say, with level 1 for extensive and 2 for usual. I'm quite willing to put up with the nuisance of print() enclosures. I prefer it to having yet another way to subvert the evaluation model. I'm a believer in testing everything possible in my packages, and wear it it as a badge of honor that the survival package has 4 lines of R code in the tests directory for every 3 in the R directory. But CRAN only needs to run a small subset of this. Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] if(--as-cran)?
On 09/04/2012 01:57 PM, Duncan Murdoch wrote: On 04/09/2012 2:36 PM, Warnes, Gregory wrote: On 9/4/12 8:38 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 04/09/2012 8:20 AM, Terry Therneau wrote: On 09/04/2012 05:00 AM, r-devel-requ...@r-project.org wrote: The issue is not just about CRAN vs off CRAN. It is good to think about a more general scheme of light testing vs normal testing vs extensive testing, e.g., for the situation where the package implements (simulation/bootstrap/ ..) based inference, and the developer (but not only) should be able to run the extensive tests. Martin I agree with Martin. A mechanism to specify testing level would be the best. Then CRAN can choose to set that variable to 3 say, with level 1 for extensive and 2 for usual. I'm quite willing to put up with the nuisance of print() enclosures. I prefer it to having yet another way to subvert the evaluation model. I'm a believer in testing everything possible in my packages, and wear it it as a badge of honor that the survival package has 4 lines of R code in the tests directory for every 3 in the R directory. But CRAN only needs to run a small subset of this. We have a mechanism to specify testing level: the --as-cran flag. We could presumably make it more elaborate by adding other flags, or option levels, or whatever. What I think we shouldn't do is try to create an R-level test that says if (testingLevel() 3) { doSomething } because tests can be turned on and off, individually. If testingLevel 3 specified tests (A, B, C), then is our testingLevel higher if we are running tests (A, B, D, E, F, G)? Why not just test for the presence of whichever test is most relevant to that particular code block, e.g. if (D %in% tests()) { doSomething } I would prefer the testingLevel() approach of the D %in% tests() approach, since testingLevel() provides a natural way to add successively greater test details without having to dig into the code to determine the set of tests. I don't see how you could possibly calculate a single number in a reasonable way. What is the number that should be returned for (A, B, D, E, F, G)? Duncan Murdoch Duncan is leapfrogging ahead to another level that I hadn't thought of. An example would be to divide my survival package as cox, parametric, survfit, all, some of whaich overlap. Interesting idea, but beyond what I'd use. When I'm focused on a detail I run the test of interest directly, not through CMD check. For me low, med, high intensity suffices with as-cran invoking the first level and high including those that take an exceptionally long time. If you went to an A, B, C, ... approach what would be the as-cran default? Of course there is then the furthest level, which I recently instituted for survival due to the large number of dependencies: before posting a change download all the other dependent packages and run their tests too. It's an overnighter, and in that case I'd want level=high. Forewarned is forearmed. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rd] Numerics behind splineDesign
On 08/02/2012 05:00 AM, r-devel-requ...@r-project.org wrote: Now I just have to grovel over the R code in ns() and bs() to figure out how exactly they pick knots and handle boundary conditions, plus there is some code that I don't understand in ns() that uses qr() to postprocess the output from spline.des. I assume this is involved somehow in imposing the boundary conditions... Thanks again everyone for your help, -- Nathaniel The ns and bs function post-process the spline bases to get an orthagonal basis matrix, this is the use of qr. I think this causes much more grief than it is worth, for the sake of a small increase in numeric stability. For instance when you plot the spline bases, they don't look anything like the basis functions one would expect. (Perhaps my background in numerical analysis was a hindrance here, since I know something about splines and thus have an expectation). Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rd] Numerics behind splineDesign
Clearly, I got it wrong. Thanks to others for the clearer and correct message. Terry T On 08/02/2012 11:59 AM, William Dunlap wrote: If R's bs() and ns() are like S+'s (they do give very similar results* and S+'s were written by Doug Bates), then bs() does not do any linear algebra (like qr()) on splineDesign's output. bs() needs to come up with a default set of knots (if the user didn't supply them), combine the Boundary.knots (repeated ord times) and knots into one vector to pass to splineDesign, and, if there are any x's outside of range(Boundary.knots), treat them specially. ns() needs to project splineDesign's basis functions onto the subspace of functions whose 2nd derivatives are zero at the endpoints. The projection can be done with qr() and friends (S+ uses qr()). (If you want basis functions for a periodic spline you could use a similar procedure to project to the subspace of functions whose first and second derivatives match at the upper and lower Boundary.knots.) * The only difference between the R and S+ versions of bs() that I've noticed is in how they deal with the x's that are outside of range(Boundary.knots). S+ extrapolates with cubics both above and below that range while R extrapolates with a cubic below the range and with a quadratic above the range. I don't know what the rationale for this is. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Terry Therneau Sent: Thursday, August 02, 2012 6:10 AM To: r-devel@r-project.org; Nathaniel Smith Subject: Re: [Rd] Rd] Numerics behind splineDesign On 08/02/2012 05:00 AM, r-devel-requ...@r-project.org wrote: Now I just have to grovel over the R code in ns() and bs() to figure out how exactly they pick knots and handle boundary conditions, plus there is some code that I don't understand in ns() that uses qr() to postprocess the output from spline.des. I assume this is involved somehow in imposing the boundary conditions... Thanks again everyone for your help, -- Nathaniel The ns and bs function post-process the spline bases to get an orthagonal basis matrix, this is the use of qr. I think this causes much more grief than it is worth, for the sake of a small increase in numeric stability. For instance when you plot the spline bases, they don't look anything like the basis functions one would expect. (Perhaps my background in numerical analysis was a hindrance here, since I know something about splines and thus have an expectation). Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fast Kendall's tau
Note that the survConcordance function, which is equivalent to Kendall's tau, also is O(n log n) and it does compute a variance. The variance is about 4/5 of the work. Using R 2.15.0 on an older Linux box: require(survival) require(pcaPP) tfun - function(n) { + x - 1:n + runif(n)*n + y - 1:n + t1 - system.time(cor.test(x,y, method=kendall)) + t2 - system.time(cor.fk(x,y)) + t3 - system.time(survConcordance(Surv(y) ~ x)) + rbind(cor.test=t1, cor.fk=t2, survConcordance= t3) + } tfun(1e2) user.self sys.self elapsed user.child sys.child cor.test0.0000 0.004 0 0 cor.fk 0.0000 0.001 0 0 survConcordance 0.0040 0.006 0 0 tfun(1e3) user.self sys.self elapsed user.child sys.child cor.test0.0240 0.026 0 0 cor.fk 0.0000 0.000 0 0 survConcordance 0.0040 0.004 0 0 tfun(1e4) user.self sys.self elapsed user.child sys.child cor.test2.2240.004 2.227 0 0 cor.fk 0.0040.000 0.003 0 0 survConcordance 0.0280.000 0.028 0 0 tfun(5e4) user.self sys.self elapsed user.child sys.child cor.test 55.5510.008 55.574 0 0 cor.fk 0.0160.000 0.018 0 0 survConcordance 0.2040.016 0.221 0 0 I agree with Brian, especially since the Spearman and Kendall results rarely (ever?) disagree on their main message for n50. At the very most, one might add a footnote to the the help page for cor.test pointing to the faster codes. Terry T. Brian R wrote: On 12-06-25 2:48 PM, Adler, Avraham wrote: Hello. Has any further action been taken regarding implementing David Simcha's fast Kendall tau code (now found in the package pcaPP as cor.fk) into R-base? It is literally hundreds of times faster, although I am uncertain as to whether he wrote code for testing the significance of the parameter. The last mention I have seen of this was in 2010https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html. You could check the NEWS file, but I don't remember anything being done along these lines. If the code is in a CRAN package, there doesn't seem to be any need to move it to base R. In addition, this is something very specialized, and the code in R is fast enough for all but the most unusual instances of that specialized task. example(cor.fk) shows the R implementation takes well under a second for 2000 cases (a far higher value than is usual). [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] model.frame and predvars
I was looking at how the model.frame method for lm works and comparing it to my own for coxph. The big difference is that I try to retain xlevels and predvars information for a new model frame, and lm does not. I use a call to model.frame in predict.coxph, which is why I went that route, but never noted the difference till now (preparing for my course in Nashville). Could someone shed light on the rationale for non-preservation? Terry T. Simple example library(survival) lfit - lm(time ~ factor(ph.ecog) + ns(age, 3), data=lung) ltemp - model.frame(lfit, data=lung[1:2,]) ltemp time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 306 1 -0.14285710.42857140.7142857 2 455 00.0000.0000.000 lfit$model[1:2,] time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 306 10.44435460.32861610.1900511 2 455 00.56972390.3618440 -0.1297479 levels(ltemp[[2]]) [1] 0 1 levels(lfit$model[[2]]) [1] 0 1 2 3 cfit - coxph(Surv(time, status) ~ factor(ph.ecog) + ns(age,3), lung) model.frame(cfit, data= lung[1:2,]) Surv(time, status) factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 30610.44435460.32861610.1900511 2 45500.56972390.3618440 -0.1297479 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] model.frame and predvars
On 06/05/2012 11:32 AM, Prof Brian Ripley wrote: On 05/06/2012 16:17, Terry Therneau wrote: I was looking at how the model.frame method for lm works and comparing it to my own for coxph. The big difference is that I try to retain xlevels and predvars information for a new model frame, and lm does not. I use a call to model.frame in predict.coxph, which is whyI went that route, but never noted the difference till now (preparing for my course in Nashville). Could someone shed light on the rationale for non-preservation? T'other way round ... it would have needed a conscious decision to preserve them: these all predate xlevels and predvars. That's true, but lots of other things have been changed, e.g. model.frame=T. I was wondering if there was a positive reason not to do it; this answers the question. model.matrix.lm does make use of xlevels, and I think that explains the difference as most lm() auxiliaries use the model matrix. But it does not keep predvars, which is interesting. So it fixes one issue but leaves another undone. And I don't see predvars used in survival:::model.frame.coxph. Actually it does, as is proven by the example I showed (ns reconstruction of the first two lines was the same). Having commented source code helps to see it of course, I replace formula in the call with the terms object from the prior fit before invoking model.frame. This causes inheritance of the both specials and predvars. So, should the survival library change to the old behavior? I prefer my current one, in that the only time I would do model.frame(prior.fit, data=new) would be if I want the same treatment of special variables. If I wanted a new, de novo decision on constrasts or codings I would have just fit a new model. Is there a plausible situation where the old behavior is prefereable? Discounting backwards compatability of course, which is always a thorny issue. Terry T. Simple example library(survival) lfit - lm(time ~ factor(ph.ecog) + ns(age, 3), data=lung) ltemp - model.frame(lfit, data=lung[1:2,]) ltemp time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 306 1 -0.1428571 0.4285714 0.7142857 2 455 0 0.000 0.000 0.000 lfit$model[1:2,] time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 306 1 0.4443546 0.3286161 0.1900511 2 455 0 0.5697239 0.3618440 -0.1297479 levels(ltemp[[2]]) [1] 0 1 levels(lfit$model[[2]]) [1] 0 1 2 3 cfit - coxph(Surv(time, status) ~ factor(ph.ecog) + ns(age,3), lung) model.frame(cfit, data= lung[1:2,]) Surv(time, status) factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3 1 306 1 0.4443546 0.3286161 0.1900511 2 455 0 0.5697239 0.3618440 -0.1297479 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Evaluation without the parent frame
Duncan, I agree completely with don't use attach; if I could get all the users of the survival package to agree as well the problem in question would go away :-) I'm thinking about ways to add more effective error surveillance. Your suggestion was not horribly complex and I'll look into it further. My first foray failed because what I want (I think) is the environment that they had when the first came up. I tried baseenv in a spot, but then my code couldn't find the model.frame function. Terry T. On 05/17/2012 05:00 AM, Duncan wrote: On 12-05-16 4:59 PM, Terry Therneau wrote: I've been tracking down a survival problem from R-help today. A short version of the primary issue is reconstructed by the following simple example: library(survival) attach(lung) fit- coxph(Surv(time, status) ~ log(age)) predict(fit, newdata=data.frame(abe=45)) Note the typo in the last line of abe instead of age. Instead of an error message, this returns predictions for all the subjects since model.frame matches age by searching more widely. I'd prefer the error. I suspect this is hard -- I'd like it to not see the attached lung data set, but still be able to find the log function. Is there a not-horribly-complex solution? The best solution is to not use attach(), use data=lung in the fit. I think if you want to use attach but limit the search, you need something like predict(fit, newdata=list2env(data.frame(abe=45), parent=baseenv())) but I don't think that meets your not horribly complex criterion. Duncan Murdoch I also tried to change the primary function to lm instead of coxph. It has the same problem, but does print a warning that the newdata and results have different lengths (which I will incorporate). Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Evaluation without using the parent frame
I've been tracking down a survival problem from R-help today. A short version of the primary issue is reconstructed by the following simple example: library(survival) attach(lung) fit - coxph(Surv(time, status) ~ log(age)) predict(fit, newdata=data.frame(abe=45)) Note the typo in the last line of abe instead of age. Instead of an error message, this returns predictions for all the subjects since model.frame matches age by searching more widely. I'd prefer the error. I suspect this is hard -- I'd like it to not see the attached lung data set, but still be able to find the log function. Is there a not-horribly-complex solution? I also tried to change the primary function to lm instead of coxph. It has the same problem, but does print a warning that the newdata and results have different lengths (which I will incorporate). Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Vignette problem
Duncan, Thanks for the ideas. I checked them out and none seem to be the culprit. My original message was wrong in one detail BTW, as I was already using graphicx not graphics. I switched over to the central server machine (CENTOS) to finish up the coxme submission until I could figure this out. Ran CMD check, tidied up the NAMESPACE, DESCRIPTION, NEWS files, fixed a missing $ sign in an Rnw, etc. Submitted the final to CRAN, then updated all the files on the Linux box from which the error below came -- both boxes point to the same Mercurial master --- and then tried again. Now it works! And I have no idea what could have prompted the change. It will remain a mystery, since I'm not going to try to bring the error back. :-) Terry T. On 05/14/2012 01:19 PM, Duncan Murdoch wrote: On 14/05/2012 1:28 PM, Terry Therneau wrote: I'm having a problem rebuilding a package, new to me in R 2.15.0 (Linux) It hits all that contain the line \usepackage[pdftex]{graphics} and leads to the following when running R CMD check on the directory. (I do this often; a final run on the tar.gz file will happen before submission.) Since I float and resize my figures, removing the line is fatal in other ways. * checking re-building of vignette PDFs ... NOTE Error in re-building vignettes: ... /usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Package pdftex.de f Error: PDF mode expected, but DVI mode detected! (pdftex.def)If you are using `latex', then call `pdflatex'. (pdftex.def)Otherwise check and correct the driver options. (pdftex.def)Error recovery by switching to PDF mode. See the pdftex.def package documentation for explanation. Type Hreturn for immediate help. ... l.414 }\@ehc ? /usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Emergency stop. ... l.414 }\@ehc No pages of output. Transcript written on lmekin.log. /usr/bin/texi2dvi: latex exited with bad status, quitting. make: *** [lmekin.pdf] Error 1 Error in buildVignettes(dir = /home/therneau/research/surv/Hg/coxme.Rcheck/vign_test/coxme) : running 'make' failed Execution halted - The resulting .tex file work just fine with pdflatex, however. I haven't found any reference to this elsewhere, but my guess is that it is something simple that I've missed. Do you have an explicit \usepackage{Sweave} in your file? If not, Sweave will add one, and that might explain the difference between your two tests. Another possibility is that you have a copy of Sweave.sty that is not the same as the one being used in one run or the other. The checks will try to tell pdflatex to use the one that comes with your R version, but other local ones can sometimes have higher precedence in pdflatex. And one idea what the problem might be: Sweave.sty uses the graphicx package, and it may conflict with the graphics package. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Vignette problem
I'm having a problem rebuilding a package, new to me in R 2.15.0 (Linux) It hits all that contain the line \usepackage[pdftex]{graphics} and leads to the following when running R CMD check on the directory. (I do this often; a final run on the tar.gz file will happen before submission.) Since I float and resize my figures, removing the line is fatal in other ways. * checking re-building of vignette PDFs ... NOTE Error in re-building vignettes: ... /usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Package pdftex.de f Error: PDF mode expected, but DVI mode detected! (pdftex.def)If you are using `latex', then call `pdflatex'. (pdftex.def)Otherwise check and correct the driver options. (pdftex.def)Error recovery by switching to PDF mode. See the pdftex.def package documentation for explanation. Type H return for immediate help. ... l.414 }\@ehc ? /usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Emergency stop. ... l.414 }\@ehc No pages of output. Transcript written on lmekin.log. /usr/bin/texi2dvi: latex exited with bad status, quitting. make: *** [lmekin.pdf] Error 1 Error in buildVignettes(dir = /home/therneau/research/surv/Hg/coxme.Rcheck/vign_test/coxme) : running 'make' failed Execution halted - The resulting .tex file work just fine with pdflatex, however. I haven't found any reference to this elsewhere, but my guess is that it is something simple that I've missed. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Method=df for coxph in survival package
In that particular example the value of 4 was pulled out of the air. There is no particular justification. There is a strong relationship between the effective degrees of freedom and the variance of the random effect, and I often find the df scale easier to interpret. See the Hodges and Sargent paper in Biometrika (2001) for a nice explanation of the connection in linear models. Terry T. === begin included message = I've been following the example in the R help page: http://stat.ethz.ch/R-manual/R-devel/library/survival/html/frailty.html library(survival); coxph(Surv(time, status) ~ age + frailty(inst, df=4), lung) Here, in this particular example they fixed the degrees of freedom for the random institution effects to be 4. But, how did they decide? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Vignette questions
On 04/12/2012 02:15 AM, Uwe Ligges wrote: On 12.04.2012 01:16, Paul Gilbert wrote: On 12-04-11 04:41 PM, Terry Therneau wrote: Context: R2.15-0 on Ubuntu. 1. I get a WARNING from CMD check for Package vignette(s) without corresponding PDF: In this case the vignettes directory had both the pdf and Rnw; do I need to move the pdf to inst/doc? Yes, you need to put the pdf in the inst/doc directory if it cannot be built by R-forge and CRAN check machines, but leave the Rnw in the vignettes directory. No, this is all done automatically by R CMD build, hence you do not need to worry. Suddenly it becomes clear: the warning will disappear on its own when I apply CMD check to the tarball. I was running it on the directory, as is my habit when I've just added a new feature and want to make sure I didn't break anything old, and had wandered down the CRAN doesn't like warnings so I better fix this somehow path. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Vignette questions
Context: R2.15-0 on Ubuntu. 1. I get a WARNING from CMD check for Package vignette(s) without corresponding PDF: In this case the vignettes directory had both the pdf and Rnw; do I need to move the pdf to inst/doc? I'm reluctant to add the pdf to the svn source on Rforge, per the usual rule that a code management system should not have both a primary source and a object dervived from it under version control. However, if this is the suggested norm I could do so. 2. Close reading of the paragraph about vignette sources shows the following -- I think? If I have a vignette that should not be rebuilt by check or BUILD I should put the .Rnw source and pdf in /inst/doc, and have the others that should be rebuilt in /vignettes. This would include any that use private R packages, screen snapshots, ..., or in my case one that takes just a little short of forever to run. 3. Do these unprocessed package also contribute to the index via \VignetteIndexEntry lines, or will I need to create a custom index? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] CRAN policies
On 03/29/2012 05:00 AM, r-devel-requ...@r-project.org wrote: The 'No visible binding for global variable is a good example. This found some bugs in my 'survey' package, which I removed. There is still one note of this type, which arises when I have to handle two different versions of the hexbin package with different internal structures. The note is a false positive because the use is guarded by an if(), but CMD check can't tell this. So, it's a good idea to remove all Notes that can be removed without introducing other code problems, which is nearly all of them, but occasionally there may be a good reason for code that produces a Note. The survival package has a similar special case: the routines for expected population survival are set up to accept multiple types of date format so have lines like if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)} This leaves me with two extraneous no visible binding messages. There used to be half a dozen but I've tried to remove as many as possible, for all the good reasons already articulated by the maintainers. It still remains that 99/100 of the no visible binding messages I've seen over the years were misspelled variable names, and the message is a very welcome check. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Vignettes and CRAN checks
I'l like to chime in on the subject of vignette checks. I have one vignette in the coxme library that would be better described as a white paper. It discusses the adequacy of the Laplace transform under various scenarios. It contains some substantial computations, so I'd like to mark it as never ever run this for both CRAN and my local builds, my next update of it will turn it into a 30+ minute compuatation. Almost all users will need only the pdf; however, my master file for creating it is .Rnw form and I'd like to make it available to those who might want to dig deeper. I (and every other program I know of) had always assumed that the Laplace approx was excellent for a mixed effects Cox model, I'm finding out that that there are a few cases where this isn't so. Luckily only a few, but this is important documentation. I also have more conventional vignettes of the how to use this package variety, the standard CRAN check process for those is both welcome and appropriate. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem with table
On 03/27/2012 02:05 AM, Prof Brian Ripley wrote: n 19/03/2012 17:01, Terry Therneau wrote: R version 2.14.0, started with --vanilla table(c(1,2,3,4,NA), exclude=2, useNA='ifany') 1 3 4 NA 1 1 1 2 This came from a local user who wanted to remove one particular response from some tables, but also wants to have NA always reported for data checking purposes. I don't think the above is what anyone would want. You have not told us what you want! Want: that the resulting table exclude values of 2 from the printout, while still reporting NA. This is what the local user expected, the one who came to me with their query. There are lots of ways to get the program to do the right thing, the simplest is table(c(1,2,3,4,NA), exclude=2) # keeping the default for useNA You show another below. Try table(as.factor(c(1,2,3,4,NA)), exclude=2, useNA='ifany') 134 NA 1111 Note carefully how 'exclude' is defined: exclude: levels to remove from all factors in ‘...’. If set to ‘NULL’, it implies ‘useNA=always’. As you did not specify a factor, 'exclude' was used in forming the 'levels'. That is almost a legal loophole reading of the manual. I would never have seen through to that level of subtlety. A primary reason is that a simple test shows that exclude works on non-factors. I'm not sure what the best course of action is. What I've reported is a case where use of the options in a fairly obvious way gives an unexpected answer. On the other hand, I have never before seen or considered the case where someone wanted to exclude an actual data level from table: I myself would always have removed a column from the result. If fixing this causes other problems, then perhaps we just give up on this rare case. As to our local choices, we figured out a way to make display of NA the default without causing the above problem. As is often the case, a fairly simple solution became obvious to us about 30 minutes after submitting a question to the list. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] PROTECT help
I received the following note this AM. The problem is, I'm not quite sure how to fix it. Can one use PROTECT(coxlist(eval(PROTECT , do I create an intermediate variable, or otherwise? I'm willing to update the code if someone will give me a pointer to the right documentation. This particular chunk was written when there was a lot of change going on in the callback mechanism and so there might be a safer and/or simpler and/or more standard aproach by now. The routine in question has to do with penalized Cox models, the C code needs to get the value of the penalty and the penalty is an arbitrary S expression passed down from top level. Terry T In survival_2.36-12 (and earlier), in the function cox_callback() at cox_Rcallback.c:40: PROTECT(coxlist=eval(lang2(fexpr,data),rho)); the return value of the call to lang2() is vulnerable if allocations within eval() give rise to garbage collection. (Discovered during CXXR development.) Andrew __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] PROTECT help
Brian Duncan: Thanks. This was exactly what I needed to know. Terry On 03/27/2012 08:41 AM, Prof Brian Ripley wrote: On 27/03/2012 14:22, Terry Therneau wrote: I received the following note this AM. The problem is, I'm not quite sure how to fix it. Can one use PROTECT(coxlist(eval(PROTECT , do I create an intermediate variable, or otherwise? You can, but I find it easiest to follow if you create an intermediate variable. Look for example at unique.c: SEXP call, r; PROTECT(call = lang2(install(as.character), s)); PROTECT(r = eval(call, env)); UNPROTECT(2); return r; I'm willing to update the code if someone will give me a pointer to the right documentation. This particular chunk was written when there was a lot of change going on in the callback mechanism and so there might be a safer and/or simpler and/or more standard aproach by now. The routine in question has to do with penalized Cox models, the C code needs to get the value of the penalty and the penalty is an arbitrary S expression passed down from top level. Terry T In survival_2.36-12 (and earlier), in the function cox_callback() at cox_Rcallback.c:40: PROTECT(coxlist=eval(lang2(fexpr,data),rho)); the return value of the call to lang2() is vulnerable if allocations within eval() give rise to garbage collection. (Discovered during CXXR development.) Andrew __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 109, Issue 22
strongly disagree. I'm appalled to see that sentence here. Come on! The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate*and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)). So aleph will not support the .C interface? ;-) It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with .C is deprecated, please upgrade to .Call stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;) Cheers, S I'll dissent -- I don't think .C is inherently any more dangerous than .Call and prefer it's simplicity in many cases. Calling C at all is what is inherently dangerous -- I can reference beyond the end of a vector, write over objects that should be read only, and branch to random places using either interface. If you are dealing with large objects and worry about memory efficiency then .Call puts more tools at your disposal and is worth the effort. However, I did not find the .Call interface at all easy to use at first and we should keep that in mind before getting too pompous in our lectures to the sinners of .C. (Mostly because the things I needed to know are scattered about in multiple places.) I might have to ask for an exemption on that timestamp -- the first bits of the survival package only reach back to 1986. And I've had to change source code systems multiple times which plays hob with the file times, though I did try to preserve the changelog history to forstall some future litigious soul who claims they wrote it first (sccs - rcs - cvs - svn - mercurial). :-) Terry T [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 109, Issue 22
On 03/22/2012 09:38 AM, Simon Urbanek wrote: On Mar 22, 2012, at 9:45 AM, Terry Therneau thern...@mayo.edu mailto:thern...@mayo.edu wrote: strongly disagree. I'm appalled to see that sentence here. Come on! The overhead is significant for any large vector and it is in particular unnecessary since in .C you have to allocate*and copy* space even for results (twice!). Also it is very error-prone, because you have no information about the length of vectors so it's easy to run out of bounds and there is no way to check. IMHO .C should not be used for any code written in this century (the only exception may be if you are passing no data, e.g. if all you do is to pass a flag and expect no result, you can get away with it even if it is more dangerous). It is a legacy interface that dates way back and is essentially just re-named .Fortran interface. Again, I would strongly recommend the use of .Call in any recent code because it is safer and more efficient (if you don't care about either attribute, well, feel free ;)). So aleph will not support the .C interface? ;-) It will look at the timestamp of the source file and delete the package if it is not before 1980 ;). Otherwise it will send a request for punch cards with .C is deprecated, please upgrade to .Call stamped out :P At that point I'll be flaming about using the native Aleph interface and not the R compatibility layer ;) Cheers, S I'll dissent -- I don't think .C is inherently any more dangerous than .Call and prefer it's simplicity in many cases. Calling C at all is what is inherently dangerous -- I can reference beyond the end of a vector, write over objects that should be read only, and branch to random places using either interface. You can always do so deliberately, but with .C you have no way of preventing it since you don't even know what is the length! That is certainly far more dangerous than .Call where you can simply loop over the length, check that the lengths are compatible etc. Also for types like strings .C is a minefield that is hard to not blow up whereas .Call it is even more safe than scalar arrays. You can do none of that with .C which relies entirely on conventions with no recorded semantics. I've overrun arrays in both .C and .Call routines, and I assure you that it was never deliberate. Very effective at crashing R with strange error messages though. I will have .C(somefun, as.integer(length(x)), x), the .Call version will skip the second argument and add a line in the C code; no real difference from my point of view. ( Though the spelling is harder to remember in .Call. Does R core use dice to decide which things are upper, lower, and mixed case: LENGTH, asInteger, ncols?) R strings are a PITA in C and I mostly avoid them so have no arguments about C vs Call there. Much of the survival library is .C for historical reasons of course, but I think it shows that you can be safe in C; though you can't be sloppy. If you are dealing with large objects and worry about memory efficiency then .Call puts more tools at your disposal and is worth the effort. However, I did not find the .Call interface at all easy to use at first I guess this depends on the developer and is certainly a factor. Personally, I find the subset of the R API needed for .Call fairly small and intuitive (in particular when you are just writing a safer replacement for .C), but I'm obviously biased. Maybe in a separate thread we could discuss this - I'd be happy to write a ref card or cheat sheet if I find out what people find challenging on .Call. Nonetheless, my point is that it is more than worth investing the effort both in safety and performance. I'm giving a short course at UseR on the design of the survival packages which promises a document containing all the details, currently being written. It has examples of .Call with discussion of what each action is doing. The final result will certainly be added to the survival package; hopefully it will be useful enough to earn a place on the CRAN documentation page as well. and we should keep that in mind before getting too pompous in our lectures to the sinners of .C. (Mostly because the things I needed to know are scattered about in multiple places.) I might have to ask for an exemption on that timestamp -- the first bits of the survival package only reach back to 1986. And I've had to change source code systems multiple times which plays hob with the file times, though I did try to preserve the changelog history to forstall some future litigious soul who claims they wrote it first (sccs - rcs - cvs - svn - mercurial). :-) ;) Maybe the rule should be based on the date of the first appearance of the package, fair enough :) Cheers, Simon I think (but not sure) that the first version to get outside our local group was around 1989. Terry
Re: [Rd] .Call ref card
On 03/22/2012 11:03 AM, peter dalgaard wrote: Don't know how useful it is any more, but back in the days, I gave this talk in Vienna http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf Looking at it now, perhaps it moves a little too quickly into the hairy stuff. On the other hand, those were the things that I had found important to figure out at the time. At a quick glance, I didn't spot anything obviously outdated. Peter, I just looked at this, and I'd say that moved into the hairy stuff way too quickly. Much of what it covered I would never expect to use. Some I ddn't understand. Part of this of course is that slides for a talk are rarely very useful without the talker. Something simpler for the layman would be good. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] .C clarification
Does .C duplicate unnecessary arguments? For instance fit - .C(xxx, as.integer(n), x, y, z=double(15)) The first and fourth arguments would have NAMED = 0. Is my guess that .C won't make yet one more (unnecessary) copy correct? (Just trying to understand). Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave driver extension
Three thinngs - My original questions to R-help was who do I talk to. That was answered by Brian R, and the discussion of how to change Sweave moved offline. FYI, I have a recode in hand that allows arbitrary reordering of chunks; but changes to code used by hundreds need to be approached cautiously. Like the witch says in Wizard of Oz: ... But that's not what's worrying me, it's how to do it. These things must be done delicately, or you hurt the spell. A few emails have made me aware of others who use noweb. Most of them, as I have, use the original Unix utility. But since survival is so interwoven with R I am trying to impliment that functionality entirely in R to make the code self contained. Just working out how to best do so. Yihui: with respect to the note below, I don't see why you want to add new syntax. Why add run_chunk(a) when it is a synonym for a? Terry T. On Mon, 2012-01-30 at 20:41 -0600, Yihui Xie wrote: OK, I did not realize the overhead problem is so overwhelming in your situation. Therefore I re-implemented the chunk reference in the knitr package in another way. In Sweave we use a= # code in chunk a @ b= # use code in a a @ And in knitr, we can use real R code: a= # code in chunk a @ b= # use code in a run_chunk('a') @ This also allows arbitrary levels of recursion, e.g. I add another chunk called 'c': c= run_chunk('b') @ Because b uses a, so when c calls b, it will consequently call a as well. The function run_chunk() will not bring overhead problems, because it simply extracts the code from other chunks and evaluates it here. It is not a functional call. This feature is still in the development version (well, I did it this afternoon): https://github.com/yihui/knitr. -- Talking about Knuth's original idea, I do not know as much as you, but under knitr's design, you can arrange code freely, since the code is stored in a named list after the input document is parsed. You can define code before using it, or use it before defining it (later); it is indexed by the chunk label. Top-down or bottom-up, in whatever order you want. And you are right; it requires a major rewrite, and that is exactly what I tried to do. I appreciate your feedback because I know you have very rich experience in reproducible research. Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes kevin.r.coom...@gmail.com wrote: I prefer the code chunks myself. Function calls have overhead. In a bioinformatics world with large datasets and an R default that uses call-by-value rather than call-by-reference, the function calls may have a _lot_ of overhead. Writing the functions to make sure they use call-by-reference for the large objects instead has a different kind of overhead in the stress it puts on the writers and maintainers of code. But then, I'm old enough to have looked at some of Knuth's source code for TeX and read his book on Literate Programming, where the ideas of weave and tangle were created for exactly the kind of application that Terry asked about. Knuth's fundamental idea here is that the documentation (mainly the stuff processed through weave) is created for humans, while the executable code (in Knuth's view, the stuff created by tangle) is intended for computers. If you want people to understand the code, then you often want to use a top-down approach that outlines the structure -- code chunks with forward references work perfectly for this purpose. One of the difficulties in mapping Knuth's idea over to R and Sweave is that the operations of weave and tangle have gotten, well, tangled. Sweave does not just prepare the documentation; it also executes the code in order to put the results of the computation into the documentation. In order to get the forward references to work with Sweave, you would have to makes two passes through the file: one to make sure you know where each named chunk is and build a cross-reference table, and one to actually execute the code in the correct order. That would presumably also require a major rewrite of Sweave. The solution I use is to cheat and hide the chunks initially and reveal them later to get the output that want. This comes down to combining eval, echo, keep.source, and expand in the right combinations. Something like: % set up a prologue that contains the code chunks. Do not evaluate or display them. coxme-check-arguments,echo=FALSE,eval=FALSE= # do something sensible. If multiple steps, define them above here # using the same idea. @ % also define the other code chunks here \section{Start the First Section} The \texttt{coxme} function is defined as follows:
[Rd] Sweave driver extension
Almost all of the coxme package and an increasing amount of the survival package are now written in noweb, i.e., .Rnw files. It would be nice to process these using the Sweave function + a special driver, which I can do using a modified version of Sweave. The primary change is to allow the following type of construction coxme coxme - function(formula, data, subset, blah blah ){ coxme-check-arguments coxme-build coxme-compute coxme-finish } @ where the parts referred to come later, and will themselves be made up of other parts. Since the point of this file is to document source code, the order in which chunks are defined is driven by create a textbook thoughts and won't match the final code order for R. The standard noweb driver only allows one level of recursion, and no references to things defined further down in the file. The primary change to the function simply breaks the main loop into two parts: first read through the all the lines and create a list of code chunks (some with names), then go through the list of chunks and call driver routines. There are a couple of other minor details, e.g. a precheck for infinite recursions, but no change to what is passed to the driver routines, nor to anything but the Sweave function itself. Primary question: who on the core team should I be holding this conversation with? Secondary: Testing level? I have a few vignettes but not many. I'll need a noweb package anyway to contain the drivers -- should we just duplicate the modified Sweave under another name? Call the package noweb, Rnoweb, ...? And before someone asks: Roxygen is a completely different animal and doesn't address what I need. I have latex equations just above the code that impliments them, an annotated graph of the call tree next to the section parsing a formula, etc. This is stuff that doesn't fit in comment lines. The text/code ratio is 1. On the other hand I've thought very little about integration of manual pages and description files with the code, issues which Roxygen addresses. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rd error message
I get the following error from one of my Rd files in R CMD check (R 2-14.0) * checking Rd files ... WARNING Error in switch(attr(block, Rd_tag), TEXT = if (!grepl(^[[:space:]]* $, : EXPR must be a length 1 vector problem found in ‘backsolve.Rd’ This is likely something that will be glaringly obvious once it's pointed out, but without a line number I can't seem to find it. I've been counting braces but don't see a mismatch. FYI, the file is below. (It is modeled on chol.Rd from the Matrix package.) Terry Therneau -- \name{backsolve} \alias{backsolve-methods} \title{Solve an Upper or Lower Triangular System} \alias{backsolve} \alias{backsolve,gchol-method} \alias{backsolve,gchol.bdsmatrix-method} \description{ Solves a system of linear equations where the coefficient matrix is upper (or \sQuote{right}, \sQuote{R}) or lower (\sQuote{left}, \sQuote{L}) triangular.\cr \code{x - backsolve(R, b)} solves \eqn{R x = b}. } \usage{ backsolve(r, \dots) \S4method{backsolve}{gchol}(r, x, k=ncol(r), upper.tri=TRUE, \dots) \S4method{backsolve}{gchol.bdsmatrix}(r, x, k=ncol(r), upper.tri=TRUE, \dots) } \arguments{ \item{r}{a matrix or matrix-like object} \item{x}{a vector or a matrix whose columns give the right-hand sides for the equations.} \item{k}{The number of columns of \code{r} and rows of \code{x} to use.} \item{upper.tri}{logical; if \code{TRUE} (default), the \emph{upper} \emph{tri}angular part of \code{r} is used. Otherwise, the lower one.} \item{\dots}{further arguments passed to other methods} } \section{Methods}{ Use \code{\link{showMethods}(backsolve)} to see all the defined methods; the two created by the bdsmatrix library are described here: \describe{ \item{bdsmatrix}{\code{signature=(r= gchol)} for a generalized cholesky decomposition} \item{bdsmatrix}{\code{signature=(r= gchol.bdsmatrix)} for the generalize cholesky decomposition of a bdsmatrix object} } } \value{ The solution of the triangular system. The result will be a vector if \code{x} is a vector and a matrix if \code{x} is a matrix. Note that \code{forwardsolve(L, b)} is just a wrapper for \code{backsolve(L, b, upper.tri=FALSE)}. } \description{ The generalized Cholesky decompostion of a symmetric matrix A is \eqn{A = LDL'}{A= LD t(L)} where D is diagonal, L is lower triangular, and \eqn{L'}{t(L)} is the transpose of L. These functions solve either \eqn{L\sqrt{D} x =b}{L sqrt(D) x=b} (when \code{upper.tri=FALSE}) or \eqn{\sqrt{D}L' x=b}{sqrt(D) t(L) x=b}. } \note{ The \code{bdsmatrix} package promotes the base R \code{backsolve} function to a generic. To see the full documentation for the default method, view \code{backsolve} from the \code{base} package. } \seealso{ \code{\link{forwardsolve}}, \code{\link{gchol}} } \keyword{ array } \keyword{ algebra } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] last message -- I've answered my own question
Yes, it was glaring and obvious: I had the label description a second time when I really meant details. Still, I had to delete sections of the file 1 by 1 until it slapped me in the face. Sorry for any bother. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] class extension and documentation
I've added a backsolve method to the bdsmatrix library. Per the Extending manual section 7.1 I've also added the following 3 lines along with my setMethod definitions for 2 classes. backsolve - function(r, ...) UseMethod(backsolve) backsolve.default - base:::backsolve formals(backsolve.default) - c(formals(backsolve.default), alist(... = )) I've also added a backsolve-methods.Rd page, though since my arguments are identical to the default it doesn't say much. And, after a test failed, added the new backsolve.default routine to my export list. Now R CMD check claims that I need Rd pages for backsolve and backsolve.default. I don't think I should rewrite those. How do I sidestep this and/or what other manuals should I read? Perhaps do setMethod(backsolve, signature(r=ALL), base:::backsolve(r, ...)) instead? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] PS to prior note
Sorry to forget this sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base --- Further note. I started out by just having a new setMethod, and let R do all the behind the scenes bookkeeping as it usually does. But then backsolve(cmat, x) fails, where cmat is an ordinary cholesky. It doesn't find the default value for ncol. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] class extension and documentation
Duncan's reply to my query Now R CMD check claims that I need Rd pages for backsolve and backsolve.default. I don't think I should rewrite those. How do I sidestep this and/or what other manuals should I read? Even though your change is subtle, I'd say it's still a change (backsolve is now a generic, not a simple closure; it now has a different argument list), so I'd like to see a new man page added. It would be quite reasonable to list the new interface and then refer to base::backsolve for the details of what the default method does. Fair enough. Let's push this a little harder. 1. Additions to the manual, section 7.1 a. Warn that foo.default must now be exported. (I don't seem to need to export foo, exportMethods(foo) seems to be enough?) b. Warn that package creation will demand a manual page for foo and foo.default. c. Give hints on how to do b. 2. More information in the setMethod page on the fragility of just add one to make a generic. I can't make this work if any of the arguments have defaults. --- Trying to impliment this idea is turning into a quagmire. Here is my current code: backsolve - function(r, x, k=ncol(r), upper.tri=TRUE, ...) UseMethod(backsolve) backsolve.default - base:::backsolve formals(backsolve.default) - c(formals(backsolve.default), alist(... = )) setMethod(backsolve, signature(r=gchol, x=ANY, k=ANY, upper.tri=ANY), function(r, x, k=ncol(r), upper.tri=TRUE, ...) { if (any(diag(r) 0)) stop(Argument has a negative diagonal, cannot backsolve) if (!is.numeric(x)) stop(Invalid data type for x) x - as.matrix(x) ... First, if I was going to have a new backsolve manual page, it should mention the arguments. This meant adding x, k, and upper.tri to the generic above. And they needed defaults to make the Rd page be both correct and pass muster. Now, however the default method doesn't work. backsolve of an ordinary matrix leads to Error in k != floor(k) : 'k' is missing Calls: backsolve - backsolve - .local What's the trick? The gchol method for backsove was documented via promptMethod. How do I refer to this object in a \link from backsolve? Accessing that documentation is certainly a challenge. I doublt I'll ever find a user to guess or remember methods?backsolve(gchol, ANY, ANY, ANY) Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] 32 vs 64 bit difference?
Thank you both for the nice explanation. I added digits=4 to my print statements to shorten the display. Mixed effects Cox models can have difficult numerical issues, as it turns out; I've added this to my collection of things to watch for. Terry Therneau On Sat, 2011-11-26 at 11:37 +, Prof Brian Ripley wrote: On 26/11/2011 09:23, peter dalgaard wrote: On Nov 26, 2011, at 05:20 , Terry Therneau wrote: I've spent the last few hours baffled by a test suite inconsistency. The exact same library code gives slightly different answers on the home and work machines - found in my R CMD check run. I've recopied the entire directory to make sure it's really identical code. The data set and fit in question has a pretty flat top to the likelihood. I put print statements in to the f() function called by optim, and the two parameters and the likelihood track perfectly for 48 iterations, then start to drift ever so slightly: theta= -3.254176 -6.201119 ilik= -16.64806 theta= -3.254176 -6.201118 ilik= -16.64806 And at the end of the iteration: theta= -3.207488 -8.583329 ilik= -16.70139 theta= -3.207488 -8.58 ilik= -16.70139 As you can see, they get to the same max, but with just a slightly different path. The work machine is running 64 bit Unix (CentOS) and the home one 32 bit Ubuntu. Could this be enough to cause the difference? Most of my tests are based on all.equal, but I also print out 1 or 2 full solutions; perhaps I'll have to modify that? We do see quite a lot of that, yes; even running 32 and 64 bit builds on the same machine, an sometimes to the extent that an algorithm diverges on one architecture and diverges on the other (just peek over on R-sig-ME). The comparisons by make check on R itself also give off quite a bit of last decimal chatter when the architecture is switched. For some reason, OSX builds seem more consistent than Windows and Linux, although I have only anecdotal evidence of that. However, the basic point is that compilers don't define the sequence of FPU operations down to the last detail, an internal extended-precision register may or may not be used, the order of terms in a sum may be changed, etc. Since 64 bit code has different performance characteristics from 32 bit code (since you shift more data around for pointers), the FPU instructions may be differently optimized too. However, the main difference is that all x86_64 chips have SSE2 registers, and so gcc makes use of them. Not all i686 chips do, so 32-bit builds on Linux and Windows only use the FPU registers. This matters at ABI level: arguments get passed and values returned in SSE registers: so we can't decide to only support later i686 cpus and make use of SSE2 without re-compiling all the system libraries (but a Linux distributor could). And the FPU registers are 80-bit and use extended precision (the way we set up Windows and on every Linux system I have seen): the SSE* registers are 2x64-bit. I believe that all Intel Macs are 'Core' or later and so do have SSE2, although I don't know how much Apple relies on that. (The reason I know that this is the 'main difference' is that you can often turn off the use of SSE2 on x86_64 and reproduce the i686 results. But because of the ABI differences, you may get crashes: in R this matters most often for complex numbers which are 128-bit C99 double complex and passed around in an SSE register.) Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to deal with package conflicts
The ridge() function was put into the survival package as a simple example of what a user could do with penalized functions. It's not a serious function, and I'd be open to any suggestions for change. Actually, for any L2 penalty + Cox model one is now better off using coxme as the maximization process is much better thought out there. I'd be happy to remove ridge from survival -- except that there are bound to be lots of folks using the function and any such changes (even good ones) to the survival package are fraught with peril. Duncan: this raises a larger point. I've often wished that I could have namespace like rules apply to formulas. Using survival again, when I implemented gam-like smooths I had to create pspline rather than use the more natural s() notation. In survival, it would be good to do this for ridge, cluster, pspline, and frailty; all of whom depend deeply on a coxph context. It would also solve a frailty() problem of long standing, that when used in survreg only a subset of the frailty options make sense; this is documented in the help file but catches users again and again. Terry Therneau On Fri, 2011-11-25 at 12:00 +0100, r-devel-requ...@r-project.org wrote: In my genridge package, I define a function ridge() for ridge regression, creating objects of class 'ridge' that I intend to enhance. In a documentation example, I want to use some functions from the car package. However, that package requires survival, which also includes a ridge() function, for coxph models. So, once I require(car) my ridge() function is masked, which means I have to use the awkward form below in my .Rd files. ridgemod- genridge::ridge(...) I tried to detach survival, but that doesn't work: detach(package:survival) Error: package ?survival? is required by ?car? so will not be detached I don't see any solution to this, other than (a) renaming my ridge() to something else -- don't want to do this (b) use \dontrun{} for the examples that use car Or, is there some other way? Not really. I'd say the renaming is the preferred way to go, but you might also be able to convince Terry Therneau (survival author) to make ridge() a generic, so your method is called for your objects, and his is called for others. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to deal with package conflicts
On Fri, 2011-11-25 at 09:50 -0500, Duncan Murdoch wrote: I think the general idea in formulas is that it is up to the user to define the meaning of functions used in them. Normally the user has attached the package that is working on the formula, so the package author can provide useful things like s(), but if a user wanted to redefine s() to their own function, that should be possible. Formulas do have environments attached, so both variables and functions should be looked up there. I don't agree that this is the best way. A function like coxph could easily have in its documentation a list of the formula specials that it defines internally. If the user want something of their own they can easily use a different word. In fact, I would strongly recommend that they don't use one of these key names. For things that work across mutiple packages like ns(), what user in his right mind would redefine it? So I re-raise the question. Is there a reasonably simple way to make the survival ridge() function specific to survival formulas? It sets up structures that have no meaning anywhere else, and its global definition stands in the way of other sensible uses. Having it be not exported + obey namespace type sematics would be a plus all around. Philosophical aside: I have discovered to my dismay that formulas do have environments attached, and that variables/functions are looked up there. This made sensible semantics for predict() within a function impossible for some of the survival functions, unless I were to change all the routines to a model=TRUE default. (And a change of that magnitude to survival, with its long list of dependencies, is not fun to contemplate. A very quick survey reveals several dependent packages will break.) So I don't agree nearly so fully with the should part of your last sentence. The out of context evaluations allowed by environments are, I find, always tricky and often lead to intricate special cases. Thus, moving back and forth between how it seems that a formula should work, and how it actually does work, sometimes leaves my head spinning. Terry T. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to deal with package conflicts
On Fri, 2011-11-25 at 10:42 -0500, Michael Friendly wrote: Duncan provided one suggestion: make ridge() an S3 generic, and rename ridge() to ridge.coxph(), but this won't work, since you use ridge() inside coxph() and survreg() to add a penalty term in the model formula. Another idea might be simply to not export ridge(), but I have the feeling this will break your R CMD checks. The S3 generic idea won't work. The argument inside ridge(x) is an ordinary variable, and it's the argument inside that a generic uses for dispatch. I want to dispatch based on the context, which is what the namespace mechanism does for a call to for instance coxpenal.fit, a non exported survival function. I suspect that not exporting ridge would work for coxph(Surv(time, status) ~ ph.ecog + ridge(age), data=lung) but not for myform -Surv(time, status) ~ ph.ecog + ridge(age) coxph(myform, data=lung) (I haven't test this) This is because formulas are treated rather like functions, with bindings coming into play when they are first defined, not when they are first used. Alternatively, my particular problem (wanting to use car::vif in my package documentation) would be solved if John Fox considered making making survival a Suggests: package rather than a Depends: one. This might work, since survival is only referenced in car by providing Anova() methods for coxph models. I think all of this raises a general issue of unintended consequences of package bloat, where (a) Depends: packages are forced to load by require()/library(), whether they are really needed or not; (b) There is nothing like require(car, depends=FALSE) to circumvent this; (c) Once a require()'d package is loaded, it cannot be unloaded; (d) AFAIK, there is no way for a package author to override the masking of functions or data provided by other other packages, except by using mypackage::myfun() calls. To me this seems to be a flaw in the namespace mechanism. I will say that the long list of reverse depends on the survival package does give me pause when making changes. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to deal with package conflicts
I like the idea of making the functions local, and will persue it. This issue has bothered me for a long time -- I had real misgivings when I introduced cluster to the package, but did not at that time see any way other than making it global. I might make this change soon in the ridge function, since it's a good test case -- less likely to cause downstream troubles. Here is another possible approach: Inside coxph, just before calling eval with the formula, create a new environment tempenv which consists of my handful of special functions (ridge, frailty, cluster, pspline) who have meaning only inside a coxph call, with a parent environment of the tempenv being the current environment of the formula. Then set the environment of the formula to tempenv, then eval. Would this work? Two small further questions: 1. Any special rules for the documentation? We need a page for cluster, but want to mark it almost like a method in the sense of applying only in a one context. 2. Does one scheme or another work best for downstream functions like predict or model.matrix? Duncan's idea of direct modification might have an advantage (?) in that the terms object would be permanently changed. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] 32 vs 64 bit difference?
I've spent the last few hours baffled by a test suite inconsistency. The exact same library code gives slightly different answers on the home and work machines - found in my R CMD check run. I've recopied the entire directory to make sure it's really identical code. The data set and fit in question has a pretty flat top to the likelihood. I put print statements in to the f() function called by optim, and the two parameters and the likelihood track perfectly for 48 iterations, then start to drift ever so slightly: theta= -3.254176 -6.201119 ilik= -16.64806 theta= -3.254176 -6.201118 ilik= -16.64806 And at the end of the iteration: theta= -3.207488 -8.583329 ilik= -16.70139 theta= -3.207488 -8.58 ilik= -16.70139 As you can see, they get to the same max, but with just a slightly different path. The work machine is running 64 bit Unix (CentOS) and the home one 32 bit Ubuntu. Could this be enough to cause the difference? Most of my tests are based on all.equal, but I also print out 1 or 2 full solutions; perhaps I'll have to modify that? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel Digest, Vol 105, Issue 22
On Tue, 2011-11-22 at 12:00 +0100, r-devel-requ...@r-project.org wrote: How are the package authors supposed to develop their own NAMESPACEd packages efficiently? And what are the directions R is taking in order to facilitate the development cycle? This is my strategy. I have a separate directory test.local in my tree, not exported to Rforge. It has a Makefile which loads all the sources from ../R, copies the C files from ../src and makes a local loadable S.so. I then do all my development there, without a name space. I can overwrite functions, trace, add browser() calls --- all the things you want to do --- with standard semantics. I obviously don't load my package. I think this is the simplest route. Namespaces are a great idea, but during testing I don't want all those protections. It took me a half hour to set up the Makefile; I've recouped that effort many times over. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Small nit in Sweave
Two small Sweave issues. 1. I had the following line in my code echo=FALSE, results=hide resulting in the message Error in match.arg(options$results, c(verbatim, tex, hide)) : 'arg' should be one of “verbatim”, “tex”, “hide” I puzzled on this a bit since my argument exactly matched the message, until I thought of trying it without the quotes. 2. Where should I have reported this? Sweave is not on CRAN so I couldn't find the maintainer email there. Perhaps I'm missing something obvious though Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] multiple defines of diag
The current coxme code has functions that depend on bdsmatrix and others that depend on Matrix, both those pacakges define S4 methods for diag. When loaded, the message appears: replacing previous import ‘diag’ when loading ‘Matrix’ Questions: 1. Do I need to worry about this? If so, what can I do about it? I suppose I could add an importFrom directive, but it will be a pain unless there is an allbut(diag) option I'm not aware of. (I assume that methods and classes can be listed in importFrom, i.e., there are no importMethodsFrom or importClassesFrom functions). 2. If I don't need to worry, is there a way to turn the message off? I as a developer need to see it, but users don't and it may worry them unnecessarily. Updating all 17 of my test/*.Rout.save files is a nuisance as well, but only a nuisance. I'd like to upload this to CRAN soon as I have users asking for the updated lmekin function (which uses Matrix). In the long term all the bdsmatrix functions will be replaced by Matrix, but that requires major changes to C code so long is the operative word. Thanks, Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] multiple defines of diag
On Thu, 2011-10-06 at 10:00 -0400, Kasper Daniel Hansen wrote: if you're using two packages that both define a diag function/method you absolutely _have_ to resolve this using your NAMESPACE. [Update: I see both are methods. I actually don't know what happens when you have the same generic in both packages] Your response made me look further, with some surprising results. 1. Sequential loading tmt226% R --vanilla R version 2.13.0 (2011-04-13) library(bdsmatrix) tmat - bdsmatrix(c(3,2,2,4), c(22,1,2,21,3,20,19,4,18,17,5,16,15,6,7, 8,14,9,10,13,11,12), matrix(c(1,0,1,1,0,0,1,1,0,1,0,10,0, 0,1,1,0,1,1,0,1,1,0,1,0,10), ncol=2)) tmat[1:7,1:7] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 22120000 [2,]1 2130000 [3,]23 200000 [4,]000 19400 [5,]0004 1800 [6,]00000 175 diag(tmat) [1] 22 21 20 19 18 17 16 15 14 13 12 10 10 library(Matrix) Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from 'package:base': det diag(tmat) [1] 22 21 20 19 18 17 16 15 14 13 12 10 10 Things to note: in this case I did not get a message about overwriting the diag method, it works. This was not my experience with ranef(), an S3 generic that coxme, nlme, and lme4 all define; there whichever library loaded last did not discover existing methods. That is, if one loaded nlme after coxme, then ranef(a coxme object) would not dispatch ranef.coxme. Our solution (Doug Bates and I) was to have both coxme and lme4 import ranef and fixef from the nlme library. However, per above it appears to work with S4 generics. Can I count on it though? - Case 2: tmt229% R --vanilla R version 2.13.0 (2011-04-13) library(coxme) Loading required package: survival Loading required package: splines Loading required package: bdsmatrix Loading required package: nlme Loading required package: Matrix Loading required package: lattice Attaching package: 'Matrix' The following object(s) are masked from 'package:base': det Warning message: replacing previous import ‘diag’ when loading ‘Matrix’ tmat - bdsmatrix(c(3,2,2,4), c(22,1,2,21,3,20,19,4,18,17,5,16,15,6,7, 8,14,9,10,13,11,12), matrix(c(1,0,1,1,0,0,1,1,0,1,0,10,0, 0,1,1,0,1,1,0,1,1,0,1,0,10), ncol=2)) diag(tmat) [1] 22 21 20 19 18 17 16 15 14 13 12 10 10 Things to note: I now get a warning message about diag. Why only here? Per the earlier comment I'm not importing all of nlme, just the two generics It still works (This example isn't reproducable for others: the coxme library on CRAN does not yet have a Matrix dependency.) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] number of copies - summary
My thanks to Bill Dunlap and Simon Urbanek for clarifying many of the details. This gives me what I need to go forward. Yes, I will likely convert more and more things to .Call over time. This clearly gives the most control over excess memory copies. I am getting more comments from people using survival on huge data sets so memory usage is an issue I'll be spending more thought on. I'm not nearly as negative about .C as Simon. Part of this is long experience with C standalone code: one just gets used to the idea that mismatched args to a subroutine are deadly. A test of all args to .C (via insertion of a browser call) is part of initial code development. Another is that constructing the return argument from .Call (a list with names) is a bit of a pain. So I will sometimes use dup=F. However, the opion of R core about .C is clear, so it behooves me to move away from it. Thanks again for the useful comments. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] number of copies
I'm looking at memory efficiency for some of the survival code. The following fragment appears in coxph.fit coxfit - .C(coxfit2, iter=as.integer(maxiter), as.integer(n), as.integer(nvar), stime, sstat, x= x[sorted,] , ... Does this make a second copy of x to pass to the routine (my expectation) or will I end up with 3: x and x[sorted,] in the local frame of reference, and another due to dup=TRUE? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] number of copies
On Mon, 2011-10-03 at 12:31 -0400, Simon Urbanek wrote: Thanks. I was hoping that x[,sorted] would act like double(n) does in a .C call, and have no extra copies made since it has no local assignment. Yes it does act the same way, you get an extra copy with double(n) as well - there is no difference. That is surprising. This is not true of Splus. Since Chambers mentions it as a specific case as well (Programming with Data, p421) I assumed that R would be at least as intellegent. He also used the unset() function to declare that something could be treated like double(n), i.e., need no further copies. But that always looked like a dangerous assertion to me and unset has disappeared, perhaps deservedly, from R. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Confusing inheritance problem
I've packaged the test library up as a tar file at ftp.mayo.edu directory therneau, file ktest.tar login username: mayoftp password: KPlLiFoz This will disappear in 3 days (Mayo is very fussy about outside access). In response to Uwe's comments 1. 2.13.0 is not recent It's not the latest, but it is recent. This is for machines at work where where upgrades happen more infrequently 2. Matrix not loaded The sessionInfo was only to show what version we have. Forgetting to load Matrix isn't the problem -- when I do that the error is quick and obvious. Thanks in advance for any pointers. Terry T. On Sat, 2011-07-16 at 19:27 +0200, Uwe Ligges wrote: On 15.07.2011 23:23, Terry Therneau wrote: I have library in development with a function that works when called from the top level, but fails under R CMD check. The paricular line of failure is rsum- rowSums(kmat0) where kmat is a dsCMatrix object. I'm currently stumped and looking for some ideas. I've created a stripped down library ktest that has only 3 functions: pedigree.R to create a pedigree or pedigreeList object, kinship.R with kinship methods for the two objects one small compute function called by the others along with the minimal amount of other information such that a call to R --vanilla CMD check ktest gives no errors until the fatal one. There are two test cases. A 3 line one that creates a dsCMatrix and call rowSums at the top level works fine, but the same call inside the kmat.pedigreeList function gives an error 'x' must be an array of at least two dimensions Adding a print statement above the rowSums call shows that the argument is a 14 by 14 dsCMatrix. I'm happy to send the library to anyone else to try and duplicate. Terry Therneau tmt% R --vanilla sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Terry, 1. Your R is not recent. 2. You do this without having Matrix loaded (according to sessionInfo())? This may already be the cause of your problems. 3. You may want to make your package available on some website. I am sure there are people who will take a look (including me, but not today). Best wishes, Uwe __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Confusing inheritance problem
I have library in development with a function that works when called from the top level, but fails under R CMD check. The paricular line of failure is rsum - rowSums(kmat0) where kmat is a dsCMatrix object. I'm currently stumped and looking for some ideas. I've created a stripped down library ktest that has only 3 functions: pedigree.R to create a pedigree or pedigreeList object, kinship.R with kinship methods for the two objects one small compute function called by the others along with the minimal amount of other information such that a call to R --vanilla CMD check ktest gives no errors until the fatal one. There are two test cases. A 3 line one that creates a dsCMatrix and call rowSums at the top level works fine, but the same call inside the kmat.pedigreeList function gives an error 'x' must be an array of at least two dimensions Adding a print statement above the rowSums call shows that the argument is a 14 by 14 dsCMatrix. I'm happy to send the library to anyone else to try and duplicate. Terry Therneau tmt% R --vanilla sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R 2.13 segfault with range()
Running on a shared CENTOS server tmt711% R --vanilla R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base load('test.rda') y - matrix(ifelse(tdata$dataset=0, NA, tdata$dataset), + ncol=ncol(tdata$dataset)) dim(y) [1] 2228335 range(y) *** caught segfault *** address 0x2b490421f000, cause 'memory not mapped' Traceback: 1: range(y) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: 3 tmt712% ls -s test.rda 2664 test.rda - The data set is too large to attach, but I can send the test.rda file off list. The data is not confidential. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] standard format for newdata objects
On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote: Er... No, I don't think Paul is being particularly rude here (and he has been doing us some substantial favors in the past, notably his useful Rtips page). I know the kind of functionality he is looking for; e.g., SAS JMP has some rather nice interactive displays of regression effects for which you'll need to fill in something for the other variables. However, that being said, I agree with Duncan that we probably do not want to canonicalize any particular method of filling in average values for data frame variables. Whatever you do will be statistically dubious (in particular, using the mode of a factor variable gives me the creeps: Do a subgroup analysis and your average person switches from male to female?), so I think it is one of those cases where it is best to provide mechanism, not policy. I agree with Peter. There are two tasks in newdata: deciding what the default reference levels should be, and building the data frame with those levels. It's the first part that is hard. For survival curves from a Cox model the historical default has been to use the mean of each covariate, which can be awful (sex coded as 0/1 leads to prediction for a hermaphrodite?). Nevertheless, I've not been able to think of a strategy that would give sensible answers for most of the data I use and coxph retains the flawed default for lack of a better idea. When teaching a class on this, I tell listeners bite the bullet and build the newdata that makes clinical sense, because package defaults are always unwise for some of the variables. How can a package possibly know that it should use bilirubin=1.0 (upper limit of normal) and AST = 45 when the data set is one of my liver transplant studies? Frank Harrell would argue that his sometimes misguided default in cph is better than the almost always wrong one in coxph though, and there is certainly some strength in that position. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] help with eval()
I've narrowed my scope problems with predict.coxph further. Here is a condensed example: fcall3 - as.formula(time ~ age) dfun3 - function(dcall) { fit - lm(dcall, data=lung, model=FALSE) model.frame(fit) } dfun3(fcall3) The final call fails: it can't find 'dcall'. The relevant code in model.frame.lm is: env - environment(formula$terms) if (is.null(env)) env - parent.frame() eval(fcall, env, parent.frame()) If the environment of the formula is .Globalenv, as it is here, the contents of parent.frame() are ignored. Adding a print(ls(parent.frame())) statement just above the final call shows that it isn't a scope issue: the variables we want are there. I don't understand the logic behind looking for variables in the place the formula was first typed (this is not a complaint). The inability to look elsewhere however has stymied my efforts to fix the scoping problem in predict.coxph, unless I drop the env(formula) argument alltogether. But I assume there must be good reasons for it's inclusion and am reluctant to do so. Terry Therneau sessionInfo() R version 2.13.0 RC (2011-04-12 r55424) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base PS. This also fails dfun3 - function(dcall) { fit - lm(dcall, data=lung) model.frame(fit, subset=1:10) } You just need to force model.frame.lm to recreate data. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] predict()
On Fri, 2011-04-15 at 09:10 +0200, peter dalgaard wrote: I couldn't reproduce it from Terry's description either, but there _is_ an issue which parallels the I'll start with an apology: as a first guess to understand the problem with predict.coxph I tried something parallel to Ivo's example using lm, during a 5 minute slice of time between meetings. I got a similar error message, posted a query based on that, and ran off. The real root of my error message, however, was a simple typographical mistake. My apologies for the premature note, whose error was gently pointed out by all three of you. I should have waited till I had more time. Now for the real problem, which is an oversight in my design. When updating the predict.coxph, residuals.coxph, survfit.coxph and survexp.coxph functions to more modern processing of the newdata argument (proper handling of factors, etc), I made a decision to have all of them call model.frame.coxph and model.matrix.coxph. Model.matrix in particular has some special steps, and it would be better to have only one point of modification. However, this introduces one more level of indirection predict - model.frame.coxph - model.frame and the parent.frame() reference in model.frame.coxph now points to predict when we actually want it to point to the parent of predict. In predict.lm the parent.frame() argument lives in the predict.lm code. I see three paths to correction: 1. Make model.frame.coxph smarter. Peter's example seems to show that this is hard. 2. Change the line in predict.coxph (and the others) mf - model.frame(object) to some form of eval() that causes the parent.frame() operation to reach back properly. I'm not yet sure what variant of eval or do.call or ... this is; the effect I'm looking for is similar to what NextMethod() does. 3. Change my call to mf - model.frame(object$terms, ... mimic the call in lm This may be the simplest, since model.frame.coxph does not do anything special. Last, I need to check whether any of the above issues affect model.matrix calls. Any comments on which is the better path would be welcomed. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Problem with dyn.load in R 2.13.0
I have a test directory for the survival suite, and dyn.load has ceased to work in it. Below shows the log: tmt1075% R --vanilla R version 2.12.2 (2011-02-25) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. dyn.load('survival.so') q() tmt1076% R13 --vanilla R version 2.13.0 RC (2011-04-11 r55409) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. dyn.load('survival.so') Error in dyn.load(survival.so) : unable to load shared object '/people/biostat2/therneau/research/surv/Rtest/survival.so': libR.so: cannot open shared object file: No such file or directory q() -- Is the issue that the .so file must have been created with the R2.13 script? That's not what the error message says, however. It almost looks like it is ignoring my first argument and looking instead for libR. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Problem with dyn.load in R 2.13.0
On Wed, 2011-04-13 at 16:45 -0400, Simon Urbanek wrote: We have no details, but my wild guess would be that you did not re-build the package for 2.13.0 and you have static libR in 2.13.0 yet dynamic in 2.12.2. Cheers, Simon Per my prior note, my guess at the root of the issue is use of user libraries, which is common here. Here are futher details if that helps. R2.13 was downloaded and built this AM in my ~/temp directory, using the standard ./configure make Then a copy of the shell script was copied to in ~therneau/bin/R13 for my convenience. I paid very little attention to the output of configure. This is running on a shared server using CentOS release 5.5 (98 users at the moment). R2.12.2 is centrally available. sessionInfo() R version 2.13.0 RC (2011-04-11 r55409) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base I can't find a libR.so anywhere -- perhaps it's static. tmt1119% pwd /people/biostat2/therneau/temp/R-rc tmt1120% find . -name 'libR*' ./src/extra/blas/libRblas.so ./src/main/libR.a ./src/modules/lapack/libRlapack.so ./src/nmath/standalone/libRmath.pc.in ./src/unix/libR.pc.in ./lib/libRblas.so ./lib/libRlapack.so Thanks for everyone's help. Terry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] drop.unused.levels
A user sent me a query on survreg, which when boiled down is a request to drop unused levels when setting up the X matrix. I don't have a strong opinion one way or the other, but am loath to make the change: I expect that code somewhere will break, perhaps a lot, when the length of the coefficients changes. On the other hand, at some point this change was made to lm(). Does the R core have an opinion on which is better, or a pointer to discussion that underlied the change to lm()? It could be enough to push me over the edge. It would affect survreg, coxph, survfit, survdiff, coxme, ... Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] rowsum
with the entirely different rowSums, but it has been around for a long time.) A lot longer than rowSums ... Bill Dunlap Spotfire, TIBCO Software --- This made me smile. The rowsums function was originally an internal part of the survival package, used for fast computation of certain sums when there is a cluster() statement. It was Statistical Sciences (S-Plus) who moved it to global status. That is, they used it in enough other places that they decided to speed it up, took over the maintainance and ownership of the function (with my blessing), and ceased to label it as part of survival in the manual. This metabug can't be laid at R's feet. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] round, unique and factor
Survfit had a bug in some prior releases due to the use of both unique(times) and table(times); I fixed it by rounding to 15 digits per the manual page for as.character. Yes, I should ferret out all the usages instead, but this was fast and it cured the user's problem. The bug is back! A data set from a local colleage triggers it. I can send the rda file to anyone who wishes. The current code has digits - floor((.Machine$double.digits) * logb(.Machine$double.base,10)) #base 10 digits Y[,1] - signif(Y[,1], digits) which gives 15 digits; should I subtract one more? Should the documentation change? In the meantime I'm looking at the more permanent fix of turning time into a factor, then back at the very end. Because it is a bigger change the potential for breakage is higer, however. Terry T. tmt45% R --vanilla R version 2.12.2 (2011-02-25) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) load('test.rda') ls() [1] temp2 temp2 - round(temp2, 15) length(unique(temp2)) [1] 954 length(table(temp2)) [1] 942 .Machine$double.eps [1] 2.220446e-16 range(temp2) [1] 0. 26.0397 Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] coxph and drop1
A recent question in r-help made me realize that I should add a drop1 method for coxph and survreg. The default does not handle strata() or cluster() properly. However, for coxph the right options for the test argument would be likelihood-ratio, score, and Wald; not chisq and F. All of them reference a chi-square distribution. My thought is use these arguments, and add an error message read the help file for drop1.coxph when the defaults appear. Any better suggestions? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] coxph and drop1
On Mon, 2011-03-14 at 12:52 -0400, John Fox wrote: Dear Terry, Possibly I'm missing something, but since the generic drop1() doesn't have a test argument, why is there a problem? args(drop1) function (object, scope, ...) If you use match.arg() against test, then the error message should be informative if one of the prescribed values isn't supplied. Best, John John Fox I stand corrected. I was looking at the help page and focused my eyes on the default and lm versions, both of which have the test argument with options none and chisq. I should have looked higher on the page. Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] unique.matrix issue [Was: Anomaly with unique and match]
Simon pointed out that the issue I observed was due to internal behaviour of unique.matrix. I had looked carefully at the manual pages before posting the question and this was not mentioned. Perhaps an addition could be made? Terry T. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Anomaly with unique and match
I stumbled onto this working on an update to coxph. The last 6 lines below are the question, the rest create a test data set. tmt585% R R version 2.12.2 (2011-02-25) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) # Lines of code from survival/tests/singtest.R library(survival) Loading required package: splines test1 - data.frame(time= c(4, 3,1,1,2,2,3), + status=c(1,NA,1,0,1,1,0), + x= c(0, 2,1,1,1,0,0)) temp - rep(0:3, rep(7,4)) stest - data.frame(start = 10*temp, + stop = 10*temp + test1$time, + status = rep(test1$status,4), + x = c(test1$x+ 1:7, rep(test1$x,3)), + epoch = rep(1:4, rep(7,4))) fit1 - coxph(Surv(start, stop, status) ~ x * factor(epoch), stest) ## New lines temp1 - fit1$linear.predictor temp2 - as.matrix(temp1) match(temp1, unique(temp1)) [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6 match(temp2, unique(temp2)) [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 NA NA NA 6 6 6 8 8 8 6 6 --- I've solved it for my code by not calling match on a 1 column vector. In general, however, should I be using some other paradym for this map to unique operation? For example match(as.character(x), unique(as.character(x)) ? Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] print(...,digits=2) behavior
This affects _many_ *.Rout.save checks in packages. I assume this is in the R-devel branch. I've got an addition to survival nearly ready to go (faster concordance calculation). At what point should I should I switch over to the newer version, fix up my .out files etc, to best mesh with the automatic checks on CRAN? It's a nuisance for me to update, but only a nuisance. I've been through it twice before (Splus version 4- 5 and Splus - R switch). I might as well time it so as to make life as easy as possible for you folks. Thanks for all the hard work. Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] function local to a fit
I've added a time-transform ability to coxph: here is an example fit - coxph(Surv(time, status) ~ age + tt(age) + sex, data=lung, tt=function(x, t, ...) x*log(t) ) The only role for tt() in the formula is to be noticed as a specials by terms(). I currently have tt defined as a function tt - function(x) It has to be exported in the namespace, documented, etc. Is there a way to make tt() local to coxph, but still be found by model.frame, so that it does not have to be global? It would seem to be neater to do a marker transform like s() in gam, cluster() in coxph, etc in this way, where the meaning is local to the enclosing function. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Anomaly in [.terms
This arose when working on an addition to coxph, which has the features that the X matrix never has an intercept column, and we remove strata() terms before computing an X matrix. The surprise: when a terms object is subset the intercept attribute is turned back on. My lines 2 and 3 below were being executed just before a call to model.frame. The simple solution was of course to do them in the opposite order so I am not waiting on a fix. Not to mention that I am not sure a fix is required, though I was surprised. Terry T. tmt1131% R R version 2.12.0 (2010-10-15) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) test - terms(Surv(time, status) ~ age + strata(ph.ecog), +specials='strata') attr(test, 'intercept') - 0 #turn off intercept test - test[-2] #remove strata test Surv(time, status) ~ age attr(,variables) list(Surv(time, status), age) attr(,factors) age Surv(time, status) 0 age 1 attr(,term.labels) [1] age attr(,specials) attr(,specials)$strata NULL attr(,order) [1] 1 attr(,intercept) [1] 1 attr(,response) [1] 1 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Rd2pdf error in R12.0
On the local machine the command R11 CMD Rd2pdf survfit.Rd works fine. R12 CMD Rd2pdf survfit.Rd fails with the message below. Converting Rd files to LaTeX ... survfit.Rd Creating pdf output from LaTeX ... Error in texi2dvi(Rd2.tex, pdf = (out_ext == pdf), quiet = FALSE, : Running 'texi2dvi' on 'Rd2.tex' failed. Messages: sh: texi2dvi: command not found Output: Error in running tools::texi2dvi --- Here is the header when invoking the newer version: R version 2.12.0 (2010-10-15) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) The R11 call is 2.11.0, same machine, same login session. I've looked in the manuals and didn't see anything specific about version 12. Our system administrator will want some hints about what to do to fix this, ere I complain. I discovered it running CMD check on a package update. Any pointers? Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] news.Rd format
I'm converting the Changelog files that I have used in the survival package (since the 1980s) to the inst/NEWS.Rd format and a couple of things are not clear from the help page. 1. What should I use for the name: NEWS or survival? 2. My section headers look like \section{Changes in version 2.36-3}{ \itemize{ etc and I get cannot extract version info from the following section titles for all of them. I must be missing something simple. Perhaps these two points could be clarified further in the manual page. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel