from:"Therneau, Terry M., Ph.D. via R\-devel"

Re: [Rd] Petition to set warnPartialMatch* options to TRUE

2024-04-23 Thread Therneau, Terry M., Ph.D. via R-devel

Let me give partial assent to Michael's suggestion:  a) have an easy way to 
turn this on 
and b) add a strong suggestion to do so to the WRE manual. Kurt's example in 
the email 
shows how to do (a);  but I just looked in the WRE manual and don't see any 
reference to 
it, nor any mention from R CMD check --help, so I don't know where I would have 
found out 
about it.

I have gotton warnings about partial matches from CRAN  wrt the survival 
package --- I 
suspect at the benefit of Kurt  --- and though I muttered under my breath about 
it at the 
time there is no doubt that it was a good idea to fix all of them, and I'm glad 
for the nudge.

Kurt: does that envionment variable contain the options string itself, or does 
it point to 
a file containing the options?

I appreciate partial matching when typing code at the terminal so want the 
feature to 
remain in that context.

Terry T

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Vignettes with long compute time

2024-03-11 Thread Therneau, Terry M., Ph.D. via R-devel

Is there a way to include the compiled version of a vignette in the doc 
directory but mark 
it to NOT be rerun by CRAN?   I think I remember that this is possible, but 
have forgotton 
how.   (It might even be a false memory.)

Terry T.

Background:  Beth Atkinson and I are splitting out many of the vignettes from 
the survival 
package into a separate package survivalVignettes.  There are a few reasons

  1. Some vignettes use packages outside of the base + recommended set; 
psueodovalues for 
instance are normally used as input to a subsequent GEE model.    Since 
survival is itself 
a recommended package, it can't legally host the pseudo.Rnw vignette.
  2. The set of vignettes for survival is large, and likely to get larger.    
It makes 
sense to slim down the size of the package itself.
  3. It allows us to use Rmd.  (Again, survival can't use anything outside of 
base + 
recommended).
  4. We have a couple of 'optional' vignettes that talk about edge cases, 
useful to some 
people but not worth the size cost of cluttering up the main package.

The current submission fails due to one vignette in group 4 which takes a 
looong time to 
run.  This vignette in particular is talking about compute time, and 
illustrates a cases 
where an O(n^2) case arises.   As sentence that warns the use "of you do this 
it will take 
hours to run" is a perfect case for a pdf that should not be recreated by R CMD 
check.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Difficult debug - follow-up

2024-02-19 Thread Therneau, Terry M., Ph.D. via R-devel

I want to thank Ivan and Bill for useful advice.   I eventually found my memory 
mistake, 
which was of the 'obvious once you see it' variety.  Ivan's note that it 
appeared to be in 
a small allocation was correct.
>   I've hit a roadblock debugging a new update to the survival package. I do 
> debugging in
> a developement envinment, i.e. I don't create and load a package but rather  
> source all
> the .R files and dyn.load an .so file, which makes things a bit easier.  
> etc...
>
Terry T.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Difficult debug

2024-02-07 Thread Therneau, Terry M., Ph.D. via R-devel

  I've hit a roadblock debugging a new update to the survival package.   I do 
debugging in 
a developement envinment, i.e. I don't create and load a package but rather  
source all 
the .R files and dyn.load an .so file, which makes things a bit easier.

   Running with R -d "valgrind --tool=memcheck --leak-check=full" one of my 
test files 
crashes in simple R code a dozen lines after the  likely culprit has been 
called, i.e, the 
survfit function for an Aalen-Johansen, containing a .Call to the new C code.   
  The 
valgrind approach had already allowed me to find a few other (mostly dumb) 
errors that led 
to an out of bounds access, e.g., the wrong endpoint variable in a for( ) loop. 
   What 
would others advise as a next step?

Here is the last part of the screen
 > fit2 <- coxph(list(Surv(tstart, tstop, bstat) ~ 1,
+    c(1:4):5 ~ age / common + shared), id= id, istate=bili4,
+   data=pbc2, ties='breslow', x=TRUE)
 > surv2 <- survfit(fit2, newdata=list(age=50), p0=c(.4, .3, .2, .1, 0))
 > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)
==31730== Invalid read of size 8
==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
==31730==    by 0x299B2C: R_alloc (memory.c:2330)
==31730==    by 0x3243C6: do_which (summary.c:1152)
==31730==    by 0x23D8EF: bcEval (eval.c:7724)
==31730==    by 0x25731F: Rf_eval (eval.c:1152)
==31730==    by 0x25927D: R_execClosure (eval.c:2362)
==31730==    by 0x25A35A: R_execMethod (eval.c:2535)
==31730==    by 0x887E93F: R_dispatchGeneric (methods_list_dispatch.c:1151)
==31730==    by 0x2A0E72: do_standardGeneric (objects.c:1344)
==31730==    by 0x2577E7: Rf_eval (eval.c:1254)
==31730==    by 0x25927D: R_execClosure (eval.c:2362)
==31730==    by 0x25A01C: applyClosure_core (eval.c:2250)
==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==31730==

  *** caught segfault ***
address 0x10, cause 'memory not mapped'

Traceback:
  1: which(smap == j)
  2: which(smap == j)
  3: mysurv(fit2, pbc2$bili4, p0 = 4:0/10, fit2, x0 = 50)

The offending call is amost certainly the one to survfit; mysurv() is a local 
function 
that caculates some things 'by hand'.   It does nothing complex: counts, loops, 
etc, the 
only non-base action is a call to Matrix::exp near the end, but the which() 
failure is 
well before that.

The session info just before the offending material:

 > sessionInfo()
R Under development (unstable) (2024-02-07 r85873)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: America/Chicago
tzcode source: system (glibc)

attached base packages:
[1] splines   stats graphics  grDevices utils datasets methods
[8] base

other attached packages:
[1] Matrix_1.6-0

loaded via a namespace (and not attached):
[1] compiler_4.4.0 tools_4.4.0    grid_4.4.0 lattice_0.22-5


---
Footnote.  The impetus for this is realizing that the robust variance for an 
Aalen-Johansen was incorrect when there are case weights for a subject that 
vary over 
time;  a rare case but will occur with time dependent IPC weights.  Carefully 
figuring 
this out has been all I did for the last week, leading to a new routine 
survfitaj.c and 
approx 14 pages of derivation and explanation in the methods.Rnw vignette.   
Subjects who 
"change horses in midstream", i.e., swap from one curve to another mid-followup 
make the 
code more complex.   This arises out of the "extended Kaplan-Meier"; I am not a 
fan of 
this statistically, but some will use it and expect my code to work.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as.Date withoug "origin"

2022-11-03 Thread Therneau, Terry M., Ph.D. via R-devel

Add my name to those who think this is a good change.

As someone who works daily with medical research data, the number of times I 
have been 
caugth by failure of statements like
    as.Date(ifelse( is.na(prog.dt), lastfu.dt, prog.dt)
is a large annoyance.    Replace ifelse by any number of selection processes 
for choosing 
among ending dates.

Terry T.

-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Question about the UCRT switch

2021-12-09 Thread Therneau, Terry M., Ph.D. via R-devel

The connected blog has the statement  "Most authors will not have to do 
anything as the 
number of CRAN packages that will need some attention is below 1%, but authors 
of packages 
using native (C, C++ or Fortran) code should read the following lines."

My packages do use a lot of C, but I never use Windows.   My reading of "the 
following 
lines" is that  I don't have to do anything.   Is this correct?

Terry T.


-- 
Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Issues with drop.terms

2021-08-23 Thread Therneau, Terry M., Ph.D. via R-devel

This is a follow-up to my earlier note on [.terms.   Based on a couple days' 
work getting 
the survival package to work around  issues, this will hopefully be more 
concise and 
better expressed than the prior note.

1.
test1 <- terms( y ~ x1:x2 + x3)
check <- drop.terms(termobj =test1, dropx = 1)
formula(check)
## ~x1:x2

The documentation for the dropx argument is "vector of positions of variables 
to drop from 
the right hand side of the model", but it is not clear what "positions" is.   I 
originally 
assumed "the order in the formula as typed", but was wrong.   I suggest adding 
a line  
"Position refers to the order of terms in the term.labels attribute of the 
terms object, 
which is also the order they will appear in a coefficient vector (not counting 
the 
intercept).

2.
library(splines)
test2 <- terms(model.frame(mpg ~  offset(cyl) + ns(hp, df=3) + disp + wt, 
data=mtcars))
check2 <- drop.terms(test2,  dropx = 2)
formula(check2)
## ~ns(hp, df=3) + wt

One side effect of how drop.terms is implemented, and one that I suspect was 
not intended, 
is that offsets are completly ignored.    The above drops both the offset and 
the disp 
term from the formula   The dataClasses and predvars attributes of the result 
are also 
incorrect: they have lost the ns() term rather than the disp term;
the results of predict will be incorrect.

attr(check2, "predvars")
##    list(offset(cyl), disp, wt)

Question: should the function be updated to not drop offsets? If not a line 
needs to be 
added to the help file.   The handling of predvars needs to be fixed regardless.

3.
test3 <- terms(mpg ~ hp + (cyl==4) + disp + wt )
check3 <- drop.terms(test3, 3)
formula(check3)
lm( check3, data=mtcars)   # fails

The drop.terms action has lost the () around the logical expression, which 
leads to an 
invalid formula.  We can argue that the user should have used I(cyc==4), but 
very many won't.

4. As a footnote, more confusion (for me) is generated by the fact that the 
"specials" 
attribute of a formula does not use the numbering discussed in 1 above.   I had 
solved 
this issue long ago in the untangle.specials function; long enough ago that I 
forgot I had 
solved it, and just wasted a day rediscovering that fact.

---

I can create a patch for 1 and 2 (once we answer my question), but a fix for 3 
is not 
clear to me.  It currently leads to failure in a coxph call that includes a 
strata so I am 
directly interested in a solution; e.g.,  coxph(Surv(time, status) ~ age + 
(ph.ecog==2) + 
strata(inst), data=lung)

Terry T

-- 

Terry M Therneau, PhD
Department of Quantitative Health Sciences
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] problems with [.terms

2021-08-21 Thread Therneau, Terry M., Ph.D. via R-devel

The survival package uses [.terms a fair bit, in particular it makes use of the 
index 
returned in the 'specials' attribute, but the base R code has at least two 
problems.
First, it does not account for offset terms, if present, and also fails for a 
formula such 
as  y ~ age + (sex=='male').   Yes, the user should have used I(sex=='male'), 
but they 
very often don't and the formula works fine without it, but not [.terms. 
Users of 
coxph regularly remind me of these flaws :-)

I first reported this in a bugzilla in 2016.   I think that I (finally) have a 
working fix 
for all the issues.   This is found below in an Sweave document (which is short 
enough 
that I've pasted it directly in) along with some further discussion.   Three 
particular 
questions are 1: my solution for (sex=="male") is a bit of a hack, but it works 
and I 
don't see anything better that is simple, and 2: we should then add 
documentation that 
both [.terms and drop.terms also remove any offset() terms.  (This is currently 
true 
anyway.) and 3. Whether this will break other code the currently works around 
the flaws.

If this looks good I could try to create a formal patch, though that is not 
something I 
have done before.

Terry T.

--
\documentclass [11pt]{article}
\newcommand{\code}[1]{\texttt{#1}}

\title{Subscripting a terms object}
\author{Terry Therneau}
\begin{document}
\maketitle
<>=
options(continue=' ')
@

\section{Introduction}
The survival package makes use of the the subscripting method for terms.
Consider the following code:
<>=
library(survival)
fit1 <- coxph(Surv(time, status) ~ age + strata(inst), data=lung)
Terms <- terms(fit1)
@

In a Cox model the strata term acts as a per-institution intercept, but it
is not a part of the $X$ matrix.
There are two strategies:
\begin{itemize}
   \item If there are strata by covariate interactions, the strata columns
 are removed from the $X$ matrix after the call to model.matrix.
 They (and the intercept) are necessary to correctly form interactions.
   \item If there are no strata by covariate interactions, the strata term is
 removed from the terms object before calling model.matrix.
 Some models may have thousands of strata, a nested case-control design for
 instance, so this is important for efficiency.
\end{itemize}

The approach is to add \code{specials='strata'} to our call to \code{terms},
which causes any strata call to be marked: the \code{specials} attribute
of the terms object will then contain 3 for the example above (the specials
index counts the response),
and \code{Terms[-2]} will create a version without the strata term.

\section{Errors in [.terms}
Our problem is that the subscripting operator for a terms object fails in 2
cases.
Consider the following increasingly complex formulas:

<>=
library(survival)
f1 <- model.frame(terms(Surv(time, status) ~ ph.ecog + strata(inst) + age,
specials="strata"), data=lung)
f2 <- model.frame(terms(Surv(time, status) ~ offset(sex) + age +
strata(ph.ecog), specials="strata"), data= lung)
f3 <- model.frame(terms(~ pspline(wt.loss) + (ph.ecog==1) + strata(inst) +
 (wt.loss + meal.cal)*sex, specials="strata"), lung)
test1 <- terms(f1)
test2 <- terms(f2)
test3 <- terms(f3)
@

The root problem is the need for multiple subscripts.  Consider \code{test1}.
\begin{itemize}
   \item The object itself is a formula of length 5, with `~' as the first
 element.
   \item Attributes are
 \begin{itemize}
   \item The variables and predvars attributes are call objects, each a 
list()
 with 4 elments: the response and all 3 predictors.
   \item The factors attribute has 4 rows and 2 columns, with row labels
 corresponding to the \code{variables} list.
   \item The specials, response, and offset (if present) attributes give
 the index of those terms in the variables attribute.
   \item The term.labels and order attributes omit the resonse and the 
offset,
 so have length 2.
   \item The dataClasses attribute is a character vector of length 4.
 \end{itemize}
\end{itemize}
So the ideal result of  term1[remove the specials] would use subscript of
\begin{itemize}
   \item \code{[-5]} on the formula itself, variables and predvars attributes
   \item \code{[-2]} for term.labels
   \item \code{[-4 , -2, drop=FALSE]} for the factor attribute
   \item \code{[-2]} for order attribute
   \item \code{[-4]} for the dataClasses attribute
\end{itemize}

That will recreate the formula that ``would have been'' had there been no
strata term.  Now look at the first portion of the code in models.R

<<>>=
`[.terms` <- function (termobj, i)
{
 resp <- if (attr(termobj, "response")) termobj[[2L]]
 newformula <- attr(termobj, "term.labels")[i]
 if (length(newformula) == 0L) newformula <- "1"
 newformula <- reformulate(newformula, resp, attr(termobj, "intercept"),

Re: [Rd] [EXTERNAL] Re: inheritance and attach

2021-03-15 Thread Therneau, Terry M., Ph.D. via R-devel

Thanks Simon.  I missed that.   It is a sensible change.
I had trouble because I had just changed computing environments this weekend (a 
forced 
change due to an institutional directive), and this caught me right after that 
so I spent 
some time chasing my tail.  Murphy's law...

Terry T.


On 3/15/21 4:45 PM, Simon Urbanek wrote:
> Terry,
>
> NEWS: CHANGES IN R 4.0.0 NEW FEATURES
>
>   \item S3 method lookup now by default skips the elements of the
>search path between the global and base environments.
>
> If you use attach(), S3 methods are hence no longer dispatched to (because it 
> is between global and base) unless you register them using .S3method(). 
> Without registration you have to load them into the global env for them to 
> work since this is now the only environment that doesn't require registration.
>
> Cheers,
> Simon
>
>
>
>> On Mar 16, 2021, at 7:19 AM, Therneau, Terry M., Ph.D. via R-devel 
>>  wrote:
>>
>> This change in R-devel just bit me.   Under the newest release, if I 
>> attach() another
>> .RData directory, the methods are not detected.
>> Was it intentional?   Running in Linux.   Here is a script of an example 
>> that works fine
>> under 3.6.2. but fails in R-devel.
>>
>> tmt% mkdir temp1
>> tmt% cd temp1
>> tmt% R
>>   # define a silly method, just for testing
>>
>> charlie <- function(x, ...)
>>  UseMethod("charlie")
>>
>>
>> charlie.default <- function(x, ...) {
>>  cat("default method ", x, "\n")
>>  x +2
>> }
>>
>> charlie.character <- function(x, ...) {
>>  cat("character method ", x, "\n")
>>  as.character(as.numeric(x) + 2)
>> }
>>
>>> quit("yes")
>> tmt% cd ..
>> tmt% R
>>> attach("temp1/.RData")
>>> charlie( 4)
>> Error in UseMethod("charlie") :
>>no applicable method for 'charlie' applied to an object of class 
>> "c('double', 'numeric')"
>>
>> 
>>
>> The use case was my local test environment for the survival package.  I can 
>> work around it.
>>
>> -- 
>> Terry M Therneau, PhD
>> Department of Health Science Research
>> Mayo Clinic
>> thern...@mayo.edu
>>
>> "TERR-ree THUR-noh"
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] inheritance and attach

2021-03-15 Thread Therneau, Terry M., Ph.D. via R-devel

This change in R-devel just bit me.   Under the newest release, if I attach() 
another 
.RData directory, the methods are not detected.
Was it intentional?   Running in Linux.   Here is a script of an example that 
works fine 
under 3.6.2. but fails in R-devel.

tmt% mkdir temp1
tmt% cd temp1
tmt% R
  # define a silly method, just for testing

charlie <- function(x, ...)
     UseMethod("charlie")


charlie.default <- function(x, ...) {
     cat("default method ", x, "\n")
     x +2
}

charlie.character <- function(x, ...) {
     cat("character method ", x, "\n")
     as.character(as.numeric(x) + 2)
}

 > quit("yes")

tmt% cd ..
tmt% R
 > attach("temp1/.RData")
 > charlie( 4)
Error in UseMethod("charlie") :
   no applicable method for 'charlie' applied to an object of class 
"c('double', 'numeric')"



The use case was my local test environment for the survival package.  I can 
work around it.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Compression (really about LazyDate)

2021-02-19 Thread Therneau, Terry M., Ph.D. via R-devel

Thank you Brian.   I had not quite grasped how the process works, now the 
descriptions and 
usage make sense.

Terry


On 2/19/21 4:28 AM, Prof Brian Ripley wrote:
> On 18/02/2021 18:30, Therneau, Terry M., Ph.D. via R-devel wrote:
>> This is a CRAN question:
>>
>> I have taken care to compress files in the data directory using "xz" (and 
>> checked that it
>> is the best).  Is there then any impact or use for the LazyDataCompression 
>> option in the
>> DESCRIPTION file?
>>
>
> I have difficulty comprehending that, so I will try to answer my guess at 
> what you meant 
> to ask.
>
> What LazyDataCompression does is completely separate from the contents of the 
> data 
> directory.  As the manual say
>
> 
> Some packages using ‘LazyData’ will benefit from using a form of compression 
> other than 
> gzip in the installed lazy-loading database. This can be selected by the 
> --data-compress 
> option to R CMD INSTALL or by using the ‘LazyDataCompression’ field in the 
> DESCRIPTION 
> file. Useful values are bzip2, xz and the default, gzip. The only way to 
> discover which 
> is best is to try them all and look at the size of the pkgname/data/Rdata.rdb 
> file.
> 
>
> When a package is installed with LazyData (and you neglected to tell us if 
> that is the 
> case), the datasets in the data directory are loaded (and hence 
> decompressed), and 
> stored in a database. For a LazyData package the compression used in the data 
> directory 
> only affects the source package size (I guess your criterion for 'best') and 
> how fast it 
> is installed (rarely a consideration but there have been LazyData packages 
> where 
> installing the data takes most of the time).  At run-time only the 
> compression specified 
> by LazyDataCompression is relevant.
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] reverse dependency checks

2021-02-18 Thread Therneau, Terry M., Ph.D. via R-devel

There are some nice tools to automate reverse dependency checks, but for a 
large package 
the real issue is the envirionment.  The description of the crandalf site on 
github has a 
nice summary.  One package uses a cryto libraries (oops, install those), 
another uses some 
latex macros I've never heard of, etc.    In any case, I am seeing some 
failures in my 
reverse checks for survival that I'm not sure how to fix.   If anyone has a 
hint or two 
I'd be grateful.

1. A half dozen fail with "graphics API mismatch"

2. Over a dozen have a latex failure of
! pdfTeX error (\pdfsetmatrix): Unrecognized format..
\AtBegShi@Output ...ipout \box \AtBeginShipoutBox
   \fi \fi

A check of kpsewhich shows I have all the packages, and I don't see a pattern 
in the .tex 
files at the point of failure.

Terry T.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Compression

2021-02-18 Thread Therneau, Terry M., Ph.D. via R-devel

This is a CRAN question:

I have taken care to compress files in the data directory using "xz" (and 
checked that it 
is the best).  Is there then any impact or use for the LazyDataCompression 
option in the 
DESCRIPTION file?


-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] issue with data()

2021-02-16 Thread Therneau, Terry M., Ph.D. via R-devel

I am testing out the next release of survival, which involves running R CMD 
check on 868 
CRAN packages that import, depend or suggest it.

The survival package has a lot of data sets, most of which are non-trivial real 
examples 
(something I'm proud of).  To save space I've bundled many of them, .e.g., 
data/cancer.rda 
has 19 different dataframes.

This caused failures in 4 packages, each because they have a line such as 
"data(lung)"  or 
data(breast, package= "survival"); and the data() command looks for a file name.

This is a question about which option is considered the best (perhaps more of a 
poll), 
between two choices

1. unbundle them again  (it does save 1/3 of the space, and I do get complaints 
from R CMD 
build about size)
2. send notes to the 4 maintainers.  The help files for the data sets have the 
usage 
documented as  "lung" or "breast", and not data(lung), so I am technically 
legal to claim 
they have a mistake.

A third option to make the data sets a separate package is not on the table.  I 
use them 
heavily in my help files and test suite, and since survival is a recommended 
package I 
can't add library(x) statements for  !(x %in% recommended).   I am guessing 
that this 
would also break many dependent packages.

Terry T.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] .Fortran to .Call

2020-12-28 Thread Therneau, Terry M., Ph.D. via R-devel

Roger,
  Over the years I have converted many of the .C calls in the survival package 
to .Call.  
As others have said, the big advantage is memory footprint.  I did it because 
there are a 
few users who call survfit or coxph with really large data sets, and not 
copying the data 
can be the difference between success and failure (run out of memory).  Like 
you, my 
experience has been that if there is enough memory, then the time required for 
.C or 
.Fortran to make the data copy is minimal.   Don't make the change in order to 
gain 
compute speed.

The downside to .Call is that R then makes the (dangerous) assumption that you 
know what 
you are doing.  That is, vectors/matrices that are passed in to the function 
will NOT have 
new data values written into them, unless you have taken the necessary steps.   
Breaking 
the promise can lead to program failures that are very hard to track down.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R/S-PLUS] [EXTERNAL] Re: [External] anonymous functions

2020-12-07 Thread Therneau, Terry M., Ph.D. via R-devel

Luke,
   Mostly an aside.  I think that pipes are a good addition, and it is clear 
that you and 
other R-core thought through many of the details.   Congratulations on what 
appears to be 
solid work. I've used Unix since '79, so it is almost guarranteed that I like 
the basic 
idiom, and I expect to make use of it.  Users who think that pipes -- or any 
other code -- 
is so clear that comments are superfluous is no reflection on R core, and also 
a bit of a 
hobby horse for me.

I am a bit bemused by the flood of change suggestions, before people have had a 
chance to 
fully exercise the new code.   I'd suggest waiting several months, or a year, 
before major 
updates, straight up bugs excepted.   The same advice holds when moving into a 
new house.
One  experience with the survival package has been that most new ideas have 
been 
implemented locally, and we run with them for half a year before submission to 
CRAN.  I've 
had a few "really great" modifications that, thankfully, were never inflicted 
on the rest 
of the R community.

Terry T.

On 12/7/20 11:26 AM, luke-tier...@uiowa.edu wrote:
> I don't disagree in principle, but the reality is users want shortcuts
> and as a result various packages, in particular tidyverse, have been
> providing them. Mostly based on formulas, mostly with significant
> issues since formulas weren't designed for this, and mostly
> incompatible (tidyverse ones are compatible within tidyverse but not
> with others). And of course none work in sapply or lapply. Providing a
> shorthand in base may help to improve this. You don't have to use it
> if you don't want to, and you can establish coding standards that
> disallow it if you like.
>
> Best,
>
> luke
>
> On Mon, 7 Dec 2020, Therneau, Terry M., Ph.D. via R-devel wrote:
>
>> “The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be 
>> helpful in 
>> making code containing simple function expressions more readable.”
>>
>> Color me unimpressed.
>> Over the decades I've seen several "who can write the shortest code" 
>> threads: in 
>> Fortran, in C, in Splus, ...   The same old idea that "short" is a synonym 
>> for either 
>> elegant, readable, or efficient is now being recylced in the tidyverse.   
>> The truth is 
>> that "short" is actually an antonym for all of these things, at least for 
>> anyone else 
>> reading the code; or for the original coder 30-60 minutes after the "clever" 
>> lines were 
>> written. Minimal use of the spacebar and/or the return key isn't usually 
>> held up as a 
>> goal, but creeps into many practiioner's code as well.
>>
>> People are excited by replacing "function(" with "\("? Really?   Are people 
>> typing code 
>> with their thumbs?
>> I am ambivalent about pipes: I think it is a great concept, but too many of 
>> my 
>> colleagues think that using pipes = no need for any comments.
>>
>> As time goes on, I find my goal is to make my code less compact and more 
>> readable.  
>> Every bug fix or new feature in the survival package now adds more lines of 
>> comments or 
>> other documentation than lines of code.  If I have to puzzle out what a line 
>> does, what 
>> about the poor sod who inherits the maintainance?
>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] anonymous functions

2020-12-07 Thread Therneau, Terry M., Ph.D. via R-devel

“The shorthand form \(x) x + 1 is parsed as function(x) x + 1. It may be helpful in making 
code containing simple function expressions more readable.”


Color me unimpressed.
Over the decades I've seen several "who can write the shortest code" threads: in Fortran, 
in C, in Splus, ...   The same old idea that "short" is a synonym for either elegant, 
readable, or efficient is now being recylced in the tidyverse.   The truth is that "short" 
is actually an antonym for all of these things, at least for anyone else reading the code; 
or for the original coder 30-60 minutes after the "clever" lines were written.  Minimal 
use of the spacebar and/or the return key isn't usually held up as a goal, but creeps into 
many practiioner's code as well.


People are excited by replacing "function(" with "\("?  Really?   Are people typing code 
with their thumbs?
I am ambivalent about pipes: I think it is a great concept, but too many of my colleagues 
think that using pipes = no need for any comments.


As time goes on, I find my goal is to make my code less compact and more readable.  Every 
bug fix or new feature in the survival package now adds more lines of comments or other 
documentation than lines of code.  If I have to puzzle out what a line does, what about 
the poor sod who inherits the maintainance?



--
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Issue with data() function

2020-10-25 Thread Therneau, Terry M., Ph.D. via R-devel

Duncan and others:  I was not being careful with my description.  This 
concerned tests of 
version 3.2-8, not yet on CRAN, in which I was trying some size-limiting 
measures.   My 
apologies for not making this clear.

   - I feel mild pressure to make the survival package smaller, per CRAN 
guidelines, and 
shrinking the data appears to be one way to approach that.  So a real point of 
the query 
is my attempts to do so.   (I am much more resistant to shrinking the extensive 
test suite 
or the vignettes.)
   -  The survival package has a lot of small data sets, and bundling them up 
into a 
single .rda file does save space, but it causes some issues with data().   The 
overall 
tarball goes from 7480 to 6100 in size (ls -s).

   Terry

On 10/24/20 4:28 AM, Duncan Murdoch wrote:
> On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> I found an issue with the data() command this evening when working on the 
>> survival 
>> package.
>>
>> 1. I have a lot of data sets in the package, almost all used in at least one 
>> vignette,
>> help file, or test.  As a space saving measure, I have bundled many of them 
>> together,
>> i.e., the file data/cancer.rda contains 19 data sets, many of them small. 
>> The resulting
>> file (using xz compression) is quite a bit smaller than the individual ones. 
>>  (I still get
>> a warning note about size from R CMD check, but I'm no longer 2x the limit.)
>>
>> 2. Consider the lung data set.  All of these fail:
>>      data(lung)
>>      data("lung")
>>      data(lung, package="survival")
>>
>>    a. The lung.Rd file had \usage{data(lung)}; that error was not caught by 
>> R CMD check.
>> (Several other .Rd files as well.)
>>
>>    b. In broader examples for teaching, I sometimes load data from other 
>> packages, e.g
>> data(aidssi, package="mstate").  But this does not work for survival.  (The 
>> larger
>> survival data sets that are in separate .rda files can be found.)
>>
>>    c. What does work is survival::lung.  Might it be useful to add a comment 
>> to data.Rd to
>> this effect?
>
> You don't describe how this dataset is being included in your package. Have 
> you moved it 
> from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of 
> these works 
> for me:
>
>  library(survival); data(lung)
>
>  library(survival); data("lung")
>
>  # Without library(survival):
>  data(lung, package="survival")
>
> I think if the lung dataset is now being included in cancer.rda, you'd need
>
>   data(cancer, package="survival")
>
> or equivalent to load it (and the rest of the datasets there).
>
>>
>>
>> 3. Creating a separate package 'survivaldata' is of course one route, and is 
>> suggested in
>> the "Writing R Extensions" guide.  But this is not possible since survival 
>> is a
>> recommended package: it can't load any non-recommended package for it's 
>> tests or
>> vignettes.  Longer term, perhaps there is way around this constraint?
>
> Maybe the solution is to put your datasets into the "datasets" package, or 
> make 
> "survivaldata" a recommended package, or just leave things as they are and 
> ignore the 
> warnings about package size.  I think that's a negotiation you should have 
> with R Core.
>
> Duncan Murdoch


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Issue with data() function

2020-10-23 Thread Therneau, Terry M., Ph.D. via R-devel

I found an issue with the data() command this evening when working on the 
survival package.

1. I have a lot of data sets in the package, almost all used in at least one 
vignette, 
help file, or test.  As a space saving measure, I have bundled many of them 
together, 
i.e., the file data/cancer.rda contains 19 data sets, many of them small. The 
resulting 
file (using xz compression) is quite a bit smaller than the individual ones.  
(I still get 
a warning note about size from R CMD check, but I'm no longer 2x the limit.)

2. Consider the lung data set.  All of these fail:
    data(lung)
    data("lung")
    data(lung, package="survival")

  a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R 
CMD check.  
(Several other .Rd files as well.)

  b. In broader examples for teaching, I sometimes load data from other 
packages, e.g 
data(aidssi, package="mstate").  But this does not work for survival.  (The 
larger 
survival data sets that are in separate .rda files can be found.)

  c. What does work is survival::lung.  Might it be useful to add a comment to 
data.Rd to 
this effect?


3. Creating a separate package 'survivaldata' is of course one route, and is 
suggested in 
the "Writing R Extensions" guide.  But this is not possible since survival is a 
recommended package: it can't load any non-recommended package for it's tests 
or 
vignettes.  Longer term, perhaps there is way around this constraint?

Terry T.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
thern...@mayo.edu

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Extra "Note" in CRAN submission

2020-09-25 Thread Therneau, Terry M., Ph.D. via R-devel

When I run R CMD check on the survival package I invariably get a note:
...
* checking for file ‘survival/DESCRIPTION’ ... OK
* this is package ‘survival’ version ‘3.2-6’
* checking CRAN incoming feasibility ... NOTE
Maintainer: ‘Terry M Therneau ’
...

This is sufficient for the auto-check process to return the following failure 
message:

Dear maintainer,

  
package survival_3.2-6.tar.gz does not pass the incoming checks automatically, 
please see the following pre-tests:
Windows:
Status: 1 NOTE
Debian:
Status: 1 NOTE


--

In the interest of smoothing things out for the CRAN maintainers I would make 
this message go away, but I don't see how.  Below is the DESCRIPTION file.   
Thanks in advance for any hints.

Terry T.

--

Title: Survival Analysis
Maintainer: Terry M Therneau 
Priority: recommended
Package: survival
Version: 3.2-6
Date: 2020-09-24
Depends: R (>= 3.4.0)
Imports: graphics, Matrix, methods, splines, stats, utils
LazyData: Yes
LazyLoad: Yes
ByteCompile: Yes
Authors@R: c(person(c("Terry", "M"), "Therneau",
 email="therneau.te...@mayo.edu",
role=c("aut", "cre")),
person("Thomas", "Lumley", role=c("ctb", "trl"),
   comment="original S->R port and R maintainer until 2009"),
   person("Atkinson", "Elizabeth", role="ctb"),
   person("Crowson", "Cynthia", role="ctb"))
Description: Contains the core survival analysis routines, including
 definition of Surv objects,
 Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models,
 and parametric accelerated failure time models.
License: LGPL (>=2)
URL: https://github.com/therneau/survival


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] CRAN checks and ASAN

2020-06-11 Thread Therneau, Terry M., Ph.D. via R-devel

I have a version of R-devel on my development box that has the address 
sanitizer turned 
on.   This was instrumental in finding a pair of subtle memory issues.  (I had 
read, but 
never written, one element past the end of an array, which caused issues on 
some 
architectures.)

1. I now get a end-of-job messsages from R CMD check survival3.2-3.tar.gz about 
leaks in 
main/eval.c.    They don't appear in 00config.out or 00install.out.
  I assume that I can ignore these?

2. When I run my long 'check all packages that depend on survival' job, a lot 
of package 
fail with sanitizer leaks.   Again, not my problem?
If so, I just need to recomple R without the ASAN tags and try again.


Terry T.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ASAN

2020-04-07 Thread Therneau, Terry M., Ph.D. via R-devel

I'm trying to chase down a possible issue with the survival package, and 
so was trying to resurrect my ASAN version of r-devel.   I added the 
lines to config.site for CC, CFLAGS and MAIN_LDFLAGS per section 4.3.3 
of the  'packages' documentation, and lines to ~/.R/Makevars.

I followed that with  svn up,  tools/rsync-recommended, make distclean, 
./configure, then make.

It fails on the rebuild of MASS with the message

==15755==ASan runtime does not come first in initial library list; you 
should either link runtime to your application or manually preload it 
with LD_PRELOAD.
ERROR: loading failed
* removing ‘/usr/local/src/R-devel/library/MASS’

It's not quite clear to me what to do next.

Terry Therneau



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] survival bug? - solved

2020-03-05 Thread Therneau, Terry M., Ph.D. via R-devel

  I ended up finding the issue by a focused code review.

Once in the past, I had a version that would fail under one architecture but 
not another, 
in that case some help from Brian Ripley pointed me to the offending line of C 
code.   
That line read, but did not write, at an invalid memory location.   Starting 
with the 
question of "what C routines have I added or modified most recently" along with 
where the 
fault appeared to occur in my test suite, I started reading C code and found 
one.   
Revised code passes tests on the winbuilder site.

For the curious, I had a line asking "is this patient id different than the 
last patient 
id" in the C routine underneath survcheck(); I'm making sure that patients 
don't go 
backwards in time. Essentially
  for (i=0; i< n; i) {
      if (id[i] != id[i-1] )  { ...}

It is still a surprise to me that just LOOKING at this out of range element 
would cause a 
failure,  [i-1] never appears on the left hand side of any expressions in the 
... chunk 
above. Nevertheless, it was an error.   Que sera sera

A strong thanks to those who gave solid suggestions for bringing up a local 
copy of Windows.

Terry T

>>> My latest submission of survival3.1-10 to CRAN fails  a check, but only on
>>> windows, which
>>> I don't use.
>>> How do I track this down?
>>> The test in question works fine on my Linux box.
>>>
>>> Terry
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] rounding change

2020-03-05 Thread Therneau, Terry M., Ph.D. via R-devel

This is a small heads up for package maintainers.   Under the more recent 
R-devel, R CMD 
check turned up some changes in the *.out files.   The simple demonstration is 
to type  
"round(51/80, 3)", which gives .638 under the old and .637 under the new.   
(One of my 
coxph test cases has a concordance of exactly 51/80).

In this particular case 51/80 is exactly .6375, but that value does not 
have an exact 
representation in base 2.  The line below would argue that the new version is 
correct, at 
least with respect to the internal representation.

 > print(51/80, digits = 20)
[1] 0.63745559

This is not a bug or problem, it just means that whichever version I put into 
my 
survival/tests/book6.Rout.save file, one of R-devel or R-current will flag an 
issue.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] survival bug?

2020-03-03 Thread Therneau, Terry M., Ph.D. via R-devel

My latest submission of survival3.1-10 to CRAN fails  a check, but only on 
windows, which 
I don't use.
How do I track this down?
The test in question works fine on my Linux box.

Terry



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] specials issue, a heads up

2020-02-24 Thread Therneau, Terry M., Ph.D. via R-devel

Thanks to all who commented.   In some defense of the person who reported the 
"bug", it 
appeared to be a company policy from above, influenced by the fact that they 
had been 
often burned by not using :: when multiple packages use the same symbol.

Some further technical detail:  coxph has 3 specials: strata, cluster, and tt.  
For the 
last of these I took a different route, and did not make it an exported symbol 
in the 
survival package. (I try to be smarter over time).  Instead, coxph first looks 
at the 
formula.  If that formula contains tt() then it creates an envionment that 
contains the 
function  "tt <- funciton(x) x", whose parent is the environment of the 
formula, changes 
the formula's environment to this, and then proceeds as usual.    As a 
curiousity, I'm 
going to change cluster to the same form, and them run my script that executes 
R CMD  
check on every package that has surivval as depend/import/suggest and see how 
many 
break.   I'm guessing I'll find several that used cluster for their own ends. 
The strata 
function will be worse.

Duncan -- to use your approach, I'd instead hack the formula before calling 
model.frame, 
so that it does not try to call cluster?

Another approach I tried was not exporting cluster(), but that fails when 
model.frame 
tries to call it.

Terry

On 2/24/20 12:21 PM, Duncan Murdoch wrote:
> On 24/02/2020 8:55 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> I recently had a long argument wrt the survival package, namely that the 
>> following code
>> didn't do what they expected, and so they reported it as a bug
>>
>>     survival::coxph( survival::Surv(time, status) ~ age + sex + 
>> survival::strata(inst),
>> data=lung)
>>
>> a. The Google R style guide  recommends that one put :: everywhere
>> b. This breaks the recognition of cluster as a "special" in the terms 
>> function.
>>
>> I've been stubborn and said that their misunderstanding of how formulas work 
>> is not my
>> problem.   But I'm sure that the issue will come up again, and multiple 
>> other packages
>> will break.
>>
>> A big problem is that the code runs, it just gives the wrong answer.
>>
>> Suggestions?
>
> I don't know how widely used survival::strata is versus the special strata 
> (or cluster, 
> or other specials).  If you were just introducing this now, I'd try to make 
> sure that 
> only one of those worked:  don't have any functions matching the names of 
> specials, or 
> have functions that generate an error if you call them.  I did that in the 
> much less 
> widely used "tables" package, e.g. Heading() has special interpretation, and 
> the Heading 
> function is defined as
>
> Heading <- function(name = NULL, override = TRUE,
>     character.only = FALSE,
>     nearData = TRUE)
>     stop("This is a pseudo-function, not meant to be called.")
>
> However, survival has far more users than tables does, so changing the name 
> of your 
> special functions or the corresponding regular functions could be a huge 
> headache.
>
> Perhaps there's a way to set a flag before evaluating the function in the 
> formula, and 
> generate a warning if survival::strata is called when it looks like the 
> special function 
> is intended.
>
> Duncan Murdoch 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] specials issue, a heads up

2020-02-24 Thread Therneau, Terry M., Ph.D. via R-devel

I recently had a long argument wrt the survival package, namely that the 
following code 
didn't do what they expected, and so they reported it as a bug

   survival::coxph( survival::Surv(time, status) ~ age + sex + 
survival::strata(inst), 
data=lung)

a. The Google R style guide  recommends that one put :: everywhere
b. This breaks the recognition of cluster as a "special" in the terms function.

I've been stubborn and said that their misunderstanding of how formulas work is 
not my 
problem.   But I'm sure that the issue will come up again, and multiple other 
packages 
will break.

A big problem is that the code runs, it just gives the wrong answer.

Suggestions?

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as-cran issue, SOLVED

2020-01-13 Thread Therneau, Terry M., Ph.D. via R-devel

Thank you to all who replied with helpful suggestions.   I had to run off to 
meetings and 
talks for a bit so am now processing it all.

1. It turns out that the issue was not with coxme, but with bsdmatrix, a 
package that 
coxme calls.  It just happens to have a function ismat() with the same general 
purpose and 
some similar variable names, which led me down the rabbit hole.   That package 
contained a 
"class(x) == " flaw, now fixed.   (The fact that bdsmatrix has been stable and 
unchanged 
for nearly a decade helped with the deception.)

2. As pointed out by Duncan and Kurt, the coxme function also had a class(x)== 
flaw.  None 
of my test cases triggered this, but since 'x' is an argument that can be 
supplied by a 
user, it certainly would have happened in package use.  Good catch.

3. Dirk gave good input about the flags in R CMD check and how to find them.   
One more 
line in the "Writing R Extensions" manual would have been helpful, namely that 
many of the 
options are NOT available in the options() command nor as arguments to R.    As 
near as I 
can tell, there is no way to turn on these logic checks within a standard R 
session.   A 
desire to do this is where I started: I would have set options(warn=2, 
error=recover) and 
found the actual offender in a few minutes; and never had to bother all you 
worthy readers.

4. I agree completely with Martin that errors like this should not be ignored.  
In fact, 
except for  "variable may be used before initialized" messages from the C 
compiler, I have 
become grateful for EVERY complaint that comes out R CMD check.   Notice the 
verb "have 
become" -- I did not start out so enthusiastic.

Again, thanks for the help.

Terry T.




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as-cran issue

2020-01-13 Thread Therneau, Terry M., Ph.D. via R-devel


Thanks for the feedback Dirk.   I sent my follow-up before I saw it.

Looking at the source code, it appears that there is no options() call to turn this on. 
Nor does "R --help" reveal a command line option.
How then does a user turn this on outside of the R CMD check envirionment, so as to chase 
things like this down?


The fact that 1. renaming my function makes the error go away, 2. my function is just a 
wrapper to inherits(), and 3. its a new error in code that hasn't changed, all point me 
towards some oddity with the check function.


Terry


On 1/13/20 10:22 AM, Dirk Eddelbuettel wrote:


On 13 January 2020 at 10:02, Therneau, Terry M., Ph.D. via R-devel wrote:
| Where can I find out (and replicate) what options as-cran turns on?

See the file src/library/tools/R/check.R in the R sources, and grep for
as_cran which is the internal variable controlled by the --as-cran option

[...]

| The check log contains multiple instances of the lines below:
|
| < Warning message:
| < In if (ismat(kmat)) { :
| <   the condition has length > 1 and only the first element will be used
|
| I don't see how the error could arise, but if I know what as-cran is doing 
perhaps I can
| replicate it.

This was widely discussed on this list and should also be in the NEWS file.

The change is about what the message says: the if () tests a scalar logical,
it appears that ismat(kmat) returns more than a scalar.

There has always been an opt-in for this to error -- cf many messages by Henrik
over the years as he tried to convince us all to use it more.


Dirk



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Error in R CMD check --as-cran ?

2020-01-13 Thread Therneau, Terry M., Ph.D. via R-devel

I've been fighting a CMD check error for coxme this morning.   I thought I had 
it fixed, 
but instead I had forgotton --as-cran on my last test run.  So the version just 
submitted 
to CRAN has warning messages in the log.

I think it is an issue with CRAN.   I've sent a message to R-devel asking for 
help.   
Since then, as a guess, I renamed my internal "ismat" function to something 
else and the 
error went away.

Here is the code block in coxme,  it is part of the "check user arguments for 
validity" 
block at the start of the function.   Run coxme in a terminal and all is well, 
run R CMD 
check without as-cran and all is well, but with --as-cran the ismat function 
gives 
warnings.  Changing the name to something else fixes the issue.   The function 
exists only 
to save some typing.  (No need to really read the block, just notice the 
multiple calls to 
ismat()

     ismat <- function (x) {
     inherits(x, c("matrix", "bdsmatrix", "Matrix"), which=FALSE)
     }
     if (missing(varlist) || is.null(varlist)) {
     varlist <- vector('list', nrandom)
     for (i in 1:nrandom) varlist[[i]] <- coxmeFull() #default
     }
     else {
     if (is.function(varlist)) varlist <- varlist()
     if (inherits(varlist, 'coxmevar')) varlist <- list(varlist)
     else if (ismat(varlist))
     varlist <- list(coxmeMlist(list(varlist)))
     else {
     if (!is.list(varlist)) stop("Invalid varlist argument")
     if (all(sapply(varlist, ismat))) {
     # A list of matrices
     if (nrandom >1)
     stop(paste("An unlabeled list of matrices is",
    "ambiguous when there are multiple random 
terms"))
     else varlist <- list(coxmeMlist(varlist))
     }
     else {  #the user gave me a list, not all matrices
     for (i in 1:length(varlist)) {
     if (is.function(varlist[[i]]))
     varlist[[i]] <-varlist[[i]]()
     if (ismat(varlist[[i]]))
     varlist[[i]] <- coxmeMlist(list(varlist[[i]]))
     if (class(varlist[[i]]) != 'coxmevar') {
     if (is.list(varlist[[i]])) {
     if (all(sapply(varlist[[i]], ismat)))
     varlist[[i]] <- coxmeMlist(varlist[[i]])
     else stop("Invalid varlist element")
     }
     else stop("Invalid varlist element")
     }
     }
     }
     }
     while(length(varlist) < nrandom) varlist <- c(varlist, 
list(coxmeFull()))
     }


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] as-cran issue

2020-01-13 Thread Therneau, Terry M., Ph.D. via R-devel

Where can I find out (and replicate) what options as-cran turns on?

The issue: the following lines generate an error in R CMD check --as-cran  for 
coxme.  But 
there is no error without as-cran nor is there one when I run the code in a 
terminal window.

ismat <- function(x)  inherits(x, "matrix") || inherits(x, "bdsmatrix") || 
inherits(x, 
"Matrix")
if (ismat(kmat)  ) 

(The second line is repeated multiple times for multiple arguments.  The ismat 
function is 
defined simply to save typing.)

The check log contains multiple instances of the lines below:

< Warning message:
< In if (ismat(kmat)) { :
<   the condition has length > 1 and only the first element will be used

I don't see how the error could arise, but if I know what as-cran is doing 
perhaps I can 
replicate it.

 >sessionInfo()
R Under development (unstable) (2020-01-13 r77659)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0
 >



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Error in [.terms

2019-10-04 Thread Therneau, Terry M., Ph.D. via R-devel

Martin,

   There are a couple of issues with [.terms that have bitten my survival code. 
 At the 
useR conference I promised you a detailed (readable) explanation, and have been 
lax in 
getting it to you. The error was first pointed out in a bugzilla note from 
2016, by the 
way.  The current survival code works around these.

Consider the following formula:

<>=
library(survival)  # only to get access to the lung data set
test <- Surv(time, status) ~  age + offset(ph.ecog) + strata(inst)
tform <- terms(test, specials="strata")
mf <- model.frame(tform, data=lung)
mterm <- terms(mf)
@

The strata term is handled in a special way by coxph, and then needs to be 
removed from 
the model formula before calling model.matrix.
To do this the code uses essentially the following, which fails for the formula 
above.

<>=
strata <- attr(mterm, "specials")$strata - attr(mterm, "response")
X <- model.matrix(mterm[-strata], mf)
@

The root problem is the need for multiple subscripts.
\begin{itemize}
   \item The formula itself has length 5, with `~' as the first element
   \item The variables and predvars attributes are call objects, each a list() 
with 4 
elments: the response and all 3 predictors
   \item The term.labels attribute omits the resonse and the offset, so has  
length 2
   \item The factors attribute has 4 rows and 2 columns
   \item The dataClasses attribute is a character vector of length 4
\end{itemize}

So the ideal result of  mterm[remove the specials] would use subscript of
\begin{itemize}
   \item [-5] on the formula itself, variables and predvars attributes
   \item [-2] for term.labels
   \item [-4 , -2, drop=FALSE] for factor attribute
   \item [-2] for order attribute
   \item [-4] for the dataClasses attribute
\end{itemize}

That will recreate the formula that ``would have been'' had there been no 
strata term.  
Now look at the first portion of the code in models.R
<<>>=
`[.terms` <- function (termobj, i)
{
     resp <- if (attr(termobj, "response")) termobj[[2L]]
     newformula <- attr(termobj, "term.labels")[i]
     if (length(newformula) == 0L) newformula <- "1"
     newformula <- reformulate(newformula, resp, attr(termobj, "intercept"), 
environment(termobj))
     result <- terms(newformula, specials = names(attr(termobj, "specials")))

     # Edit the optional attributes
}
@

The use of reformulate() is a nice trick.  However, the index reported in the 
specials 
attribute is generated with reference to the variables
attribute, or equivalently the row names of the factors attribute, not with 
respect to the 
term.labels attribute. For consistency the second line should instead be
<<>>=
newformula <- row.names(attr(termobj, "factors"))[i]
@

Of course, this will break code for anyone else who has used [.terms and, like 
me, has 
been adjusting for the ``response is counted in specials but
not in term.labels'' feature.  R core will have to discuss/decide what is the 
right thing 
to do, and I'll adapt.

The reformulate trick breaks in another way, one that only appeared on my radar 
this week 
via a formula like the following.

<>=
Surv(time, status) ~ age + (sex=='male') + strata(inst)
@

In both the term.labels attribute and the row/col names of the factors 
attribute the 
parentheses disappear, and the result of the reformulate call is not a proper 
formula.  
The + binds tighter than == leading to an error message that will confuse most 
users. We 
can argue, and I probably would, that the user should have used I(sex=='male'). 
 But they 
didn't, and without the I() it is a legal formula, or at least one that 
currently works.  
Fixing this issue is a lot harder.

An offset term causes issues in the 'Edit the optional attributes' part of the 
routine as 
well.  If you and/or R core will tell me what you think
the code should do, I'll create a patch.  My vote would be to use rownames per 
the above 
and ignore the () edge case.

The same basic code appears in drop.terms, by the way.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: RE: install_github and survival

2019-09-06 Thread Therneau, Terry M., Ph.D. via R-devel

Many thanks.  Something obvious is sometimes the hardest thing for me to see.

Terry


On 9/6/19 7:10 AM, Iñaki Ucar wrote:
> On Fri, 6 Sep 2019 at 14:08, Therneau, Terry M., Ph.D. via R-devel
>  wrote:
>> Yes, that is exactly the problem.  The code found in the "config" script is 
>> never run.
>> But why doesn't it get run?
> It should be called "configure", not "config".
>
> Iñaki


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] RE: install_github and survival

2019-09-06 Thread Therneau, Terry M., Ph.D. via R-devel

Yes, that is exactly the problem.  The code found in the "config" script is 
never run.  
But why doesn't it get run?

On 9/6/19 5:44 AM, Georgi Boshnakov wrote:
> I cloned therneau/survival and the installation failed since there is no 
> definition for exported function survfit().
> A file seems to be missing - there is survfit0() and survfit0.R but, compared 
> to CRAN, no survfit.R.
>
> Georgi Boshnakov
>
>
> --
>
> Message: 1
> Date: Thu, 05 Sep 2019 12:53:11 -0500
> From: "Therneau, Terry M., Ph.D." 
> To: "r-devel@r-project.org" 
> Subject: [Rd] install_github and survival
> Message-ID: <771925$cbi...@ironport10.mayo.edu>
> Content-Type: text/plain; charset="utf-8"
>
> I treat CRAN as the main repository for survival, but I have also had a github
> (therneau/survival) version for a couple of years.  It has a vignette2 
> directory, for
> instance, that contains extra vignettes that either take too long to run or 
> depend on
> other packages.  It also gets updated more often than CRAN (though those 
> updates mght not
> be as well tested yet).
>
> In any case, since it is there, people will of course run install_github 
> against it.
> I've added a config script to do the one extra step necessary, but when I try
> install_github it fails.   I'm clearly doing something wrong.  If someone 
> were willing to
> contribute a fix I would be most grateful.
>
> survival3.1-0 is almost ready for CRAN, by the way.   Reverse dependency 
> checks of hdnom
> turned up one last thing to repair...
>
>
> Terry Therneau
>
>
>   [[alternative HTML version deleted]]
>
>
>
>
> --
>
> Subject: Digest Footer
>
> ___
> R-devel@r-project.org mailing list  DIGESTED
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
>
> End of R-devel Digest, Vol 199, Issue 6
> ***


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] install_github and survival

2019-09-05 Thread Therneau, Terry M., Ph.D. via R-devel

I treat CRAN as the main repository for survival, but I have also had a github 
(therneau/survival) version for a couple of years.  It has a vignette2 
directory, for 
instance, that contains extra vignettes that either take too long to run or 
depend on 
other packages.  It also gets updated more often than CRAN (though those 
updates mght not 
be as well tested yet).

In any case, since it is there, people will of course run install_github 
against it.   
I've added a config script to do the one extra step necessary, but when I try 
install_github it fails.   I'm clearly doing something wrong.  If someone were 
willing to 
contribute a fix I would be most grateful.

survival3.1-0 is almost ready for CRAN, by the way.   Reverse dependency checks 
of hdnom 
turned up one last thing to repair...


Terry Therneau


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] reverse dependency checks

2019-09-03 Thread Therneau, Terry M., Ph.D. via R-devel

I remember there was advice about a server that one could use for reverse 
dependency 
checks, but I forgot to write it down.  (Or I did save the info and forgot 
where I saved 
it...)   I have been doing the checks for survival myself, but the count is 
getting out of 
hand (663, not counting bioconductor).

Any pointers?

Terry Therneau


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] problem with R CMD check

2019-08-23 Thread Therneau, Terry M., Ph.D. via R-devel

For a new release of survival, I normally run  R CMD check for every one of the 
packages 
that depend on survival  (forewarned is forearmed).

I updated R-devel on the test machine this morning, and R CMD check --as-cran 
survival_3.0-9.tar.gz works as desired.

When I run the check routine on any other source file, however, downloaded from 
CRAN, it 
fails nearly immediately with a message like the following.

* installing *source* package ‘addhazard’ ...
** package ‘addhazard’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
==5550==ASan runtime does not come first in initial library list; you should eit
her link runtime to your application or manually preload it with LD_PRELOAD.
ERROR: lazy loading failed for package ‘addhazard’
*

---

Here is my Makevars file.  (The commented out lines are from an earlier date 
when I did 
use ASAN to find a memory leak.)
CFLAGS=  -g -O2 -Wall -pedantic -mtune=native

# Use the lines below for ASAN code
#CC= gcc -std=gnu99 -fsanitize=address -fno-omit-frame-pointer
#CFLAGS=  -fno-omit-frame-pointer -g -O2 -Wall -pedantic -mtune=native
#CXX = g++ -fsanitize=address -fno-omit-frame-pointer

---
Any hints?   If all packages failed I would figure I had a global mistake, but 
why most?

Terry T.

 > sessionInfo()
R Under development (unstable) (2019-08-23 r77061)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Potential bug in update.formula when updating offsets

2019-08-06 Thread Therneau, Terry M., Ph.D. via R-devel

Yes, it is almost certainly the same issue.  At useR I promised Martin that I 
would put 
together a clear example and fix for him and I have not yet done so.  I will 
try to do 
that this week.

   The heart of the issue is that in a terms object the offset expression will 
apear in 
the 'variables' attribute but not in the 'term.labels' or 'order' attributes, 
and the base 
code tries to use the same subscripting vector for all 3 if then.   The same 
bit of code 
shows up in update.formula and in [.formula; a fix for one can be applied to 
both.

I had all this worked out, then had some problems logging into buzilla, then 
sent it to 
Martin about the same time 2-3 more urgent things got dumped on him, and then 
we've both 
let it lie. At useR he said (and I've no reason to disagree) that my prior note 
was 
unclear.  Let me recreate the example and fix, more carefully.

Terry T.

On 8/6/19 3:21 AM, peter dalgaard wrote:
> Terry, Martin
>
> Would this happen to be related to the "indexing term objects" issue that has 
> been bothering you?
>
> -pd
>
>> On 5 Aug 2019, at 21:44 , Paul Buerkner  wrote:
>>
>> Hi all,
>>
>> update.formula does not seem to correctly update (i.e. remove in my case)
>> offset terms.
>>
>> Here is an example:
>>
>> update(~x + offset(z), ~ . - offset(z))
>>> ~x + offset(z)
>> Also:
>> update(~x, ~ . - offset(z))
>>> ~x + offset(z)
>> In both cases, I would expect the result
>>> ~x
>> as  -   should remove  from the formula as happens for instance
>> in:
>>
>> update(~x + z, ~ . - z)
>>> ~x
>> I don't know if this behavior is intentional but I would say it is at least
>> unfortunate.
>>
>> Paul
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ASAN error with R-devel

2019-07-01 Thread Therneau, Terry M., Ph.D. via R-devel

I have an ASAN enabled version of R-devel on my test machine, and can get it to 
relably 
crash.  Here is the first part of the session:

tmt-local2434% R --vanilla

R Under development (unstable) (2019-06-28 r76752) -- "Unsuffered Consequences"
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

   Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
 > install.packages('kinship2')
Installing package into ‘/home/therneau/Rlib’
(as ‘lib’ is unspecified)
--- Please select a CRAN mirror for use in this session ---
=
==21606==ERROR: AddressSanitizer: heap-use-after-free on address 0x60242830 
at pc 
0x7f9f834fa0f7 bp 0x7fff24ab9050 sp 0x7fff24ab87f8
READ of size 1 at 0x60242830 thread T0
     #0 0x7f9f834fa0f6 (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5a0f6)
     #1 0x7f787837d683 in _XimUnRegisterIMInstantiateCallback 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x64683)
     #2 0x7f7878364d32 in XUnregisterIMInstantiateCallback 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x4bd32)
     #3 0x7f787837d566 in _XimRegisterIMInstantiateCallback 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x64566)
     #4 0x7f7878364cba in XRegisterIMInstantiateCallback 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x4bcba)
     #5 0x7f78745cddfb in TkpOpenDisplay 
(/usr/lib/x86_64-linux-gnu/libtk8.6.so+0xf2dfb)
     #6 0x7f787453c769 (/usr/lib/x86_64-linux-gnu/libtk8.6.so+0x61769)
     #7 0x7f787453d0ab in TkCreateMainWindow 
(/usr/lib/x86_64-linux-gnu/libtk8.6.so+0x620ab)
     #8 0x7f7874548cc6 (/usr/lib/x86_64-linux-gnu/libtk8.6.so+0x6dcc6)
     #9 0x7f7874546c7a (/usr/lib/x86_64-linux-gnu/libtk8.6.so+0x6bc7a)
     #10 0x7f787453f6c4 (/usr/lib/x86_64-linux-gnu/libtk8.6.so+0x646c4)
     #11 0x7f7874bf0c0b in tcltk_init 
/usr/local/src/R-devel/src/library/tcltk/src/tcltk.c:697
     #12 0x5582e3aa93c1 in do_dotCode 
/usr/local/src/R-devel/src/main/dotcode.c:1743
     #13 0x5582e3b41f79 in bcEval /usr/local/src/R-devel/src/main/eval.c:6775
     #14 0x5582e3b6a5df in Rf_eval /usr/local/src/R-devel/src/main/eval.c:620
     #15 0x5582e3b702b9 in R_execClosure 
/usr/local/src/R-devel/src/main/eval.c:1780
     #16 0x5582e3b72777 in Rf_applyClosure 
/usr/local/src/R-devel/src/main/eval.c:1706
     #17 0x5582e3b4ab0a in bcEval /usr/local/src/R-devel/src/main/eval.c:6743
     #18 0x5582e3b6a5df in Rf_eval /usr/local/src/R-devel/src/main/eval.c:620
     #19 0x5582e3b6c55e in forcePromise 
/usr/local/src/R-devel/src/main/eval.c:516
     #20 0x5582e3b6d22f in FORCE_PROMISE 
/usr/local/src/R-devel/src/main/eval.c:4900
  ...
     #133 0x7f788f5e6b96 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
     #134 0x5582e392b999 in _start (/usr/local/src/R-devel/bin/exec/R+0x135999)

0x60255af0 is located 0 bytes inside of 1-byte region 
[0x60255af0,0x60255af1)
freed by thread T0 here:
     #0 0x7f7891a157b8 in __interceptor_free 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7b8)
     #1 0x7f7878373d8c in XSetLocaleModifiers 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x5ad8c)

previously allocated by thread T0 here:
     #0 0x7f7891a15b50 in __interceptor_malloc 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb50)
     #1 0x7f7878373984 in _XlcDefaultMapModifiers 
(/usr/lib/x86_64-linux-gnu/libX11.so.6+0x5a984)

SUMMARY: AddressSanitizer: heap-use-after-free 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5a0f6)

--
The window with a choice of machines never appeared.

Terry T.




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] memory access question

2019-06-29 Thread Therneau, Terry M., Ph.D. via R-devel

I had a problem with the latest iteration of the survival package  (one that I 
hope to 
post to Github next week) where it would die in strange ways, i.e., work on one 
box and 
not on another, a vignette would compile if I invoked Sweave myself but fail in 
R CMD 
build, etc.   The type of thing that says quite loudly that there is a memory 
issue 
somewhere in a C routine.   The kind that has potential for making you tear 
your hair out.

In any case, I finally built an ASAN aware version of R on my test box, and it 
failed on 
something that looked minor.  I was reading one element past one of the input 
vectors, 
though I never used the value.   In essence I had  "temp = input[i]" one line 
in front of 
the "if() break" test for i.   (The while loop for i was running from n-1 to 0; 
one often 
goes from largest to smallest time value in survival code, so i was -1 at the 
failure).    
I repaired this of course, but with no real hope that it could be the actual 
issue causing 
my errors.   And now the weird behavior seems to have gone away!  The argument 
in question 
was about midway on the argument list BTW.

My question is, should I have been as surprised as I am?

And let me give a big thank you to the authors of the "debugging" section of 
the packages 
guide.  Things that reliably die are one thing, but I don't know how I would 
have found 
this one without the help.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] making a vignette optional

2019-06-18 Thread Therneau, Terry M., Ph.D. via R-devel

I had added a vignette to the coxme package and all worked well locally, but it 
failed at 
CRAN. The issue is that the vignette involves using coxme for pedigree 
data, it 
doesn't work without the kinship2 package, and I hadn't put in the necessary 
"if 
(require(" logic.

The question is, how do I make the entire vignette conditional? If the package 
isn't 
available, there is nothing to run.  The latex itself will fail when it can't 
find the 
figures  (I float them), and the parts that don't will end up as inane 
discussion of 
material that isn't there.

Terry T.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: survival changes

2019-06-01 Thread Therneau, Terry M., Ph.D. via R-devel





On 6/1/19 1:32 PM, Marc Schwartz wrote:



On Jun 1, 2019, at 12:59 PM, Peter Langfelder  
wrote:

On Sat, Jun 1, 2019 at 3:22 AM Therneau, Terry M., Ph.D. via R-devel
 wrote:

In the next version of the survival package I intend to make a non-upwardly 
compatable
change to the survfit object.  With over 600 dependent packages this is not 
something to
take lightly, and I am currently undecided about the best way to go about it.  
I'm looking
for advice.

The change: 20+ years ago I had decided not to include the initial x=0,y=1 data 
point in
the survfit object itself.  It was not formally an estimand and the 
plot/points/lines etc
routines could add this on themselves.  That turns out to have been a mistake, 
and has led
to a steady proliferation of extra bits as I realized that the time axis 
doesn't always
start at 0, and later (with multi state) that y does not always start at 1 
(though the
states sum to 1), and later the the error doesn't always start at 0, and another
realization with cumulative hazard, and ...
The new survfit method for multi-state coxph models was going to add yet 
another special
case.  Basically every component is turning into a duplicate of "row 1" vs "all 
the
others".  (And inconsistently named.)

Three possible solutions
1. Current working draft of survival_3.0.3:  Add a 'version' element to the 
survfit object
and a 'survfit2.3' function that converts old to new.  All my downstream 
functions (print,
plot,...) start with an "if (old) update to new" line.  This has allowed me to 
stage
updates to the functions that create survfit objects -- I expect it to happen 
slowly.
There will also be a survfit3.2 function to go backwards. Both the forward and 
backwards
functions leave objects alone if they are currently in the desired format.

2. Make a new class "survfit3" and the necessary 'as' functions. The package 
would contain
plot.survfit and plot.survfit3 methods, the former a two line "convert and call 
the
second" function.

3. Something I haven't thought of.

A more "clean break" solution would be to start a whole new package
(call it survival2) that would make these changes, and deprecate the
current survival. You could add warnings about deprecation and urging
users to switch in existing survival functions. You could continue
bugfixes for survival but only add new features to survival2. The new
survival2 and the current survival could live side by side on CRAN for
quite some time, giving maintainers of dependent packages (and just
plain users) enough time to switch. This could allow you to
change/clean up other parts of the package that you could perhaps also
use a rethink/rewrite, without too much concern for backward
compatibility.

Peter


Hi,

I would be cautious in going in that direction, bearing in mind that survival 
is a Recommended package, therefore included in the default R distribution from 
the R Foundation and other parties. To have two versions can/will result in 
substantial confusion, and I would argue against that approach.

There is language in the CRAN submission policy that covers API changes, which 
strictly speaking, may or may not be the case here, depending upon which 
direction Terry elects to go:

"If an update will change the package’s API and hence affect packages depending on 
it, it is expected that you will contact the maintainers of affected packages and suggest 
changes, and give them time (at least 2 weeks, ideally more) to prepare updates before 
submitting your updated package. Do mention in the submission email which packages are 
affected and that their maintainers have been informed. In order to derive the reverse 
dependencies of a package including the addresses of maintainers who have to be notified 
upon changes, the function reverse_dependencies_with_maintainers is available from the 
developer website."


Given the potential extent and impact of the changes being considered, it would 
seem reasonable to:

1. Post a note to R-Devel (possibly R-Help to cover a larger useR base) 
regarding whatever changes are finalized and formally announce them. The 
changes are likely to affect end useRs as well as package maintainers.

2. Send communications directly via e-mail to the relevant package maintainers 
that have dependencies on survival.

3. Consider a longer deprecation time frame for relevant functions, to raise 
awareness and allow for changes to be made by package maintainers and useRs as 
may be apropos. Perhaps post reminders to R-Help at relevant time points in 
advance as you approach the formal deprecation and release of the updated 
package.


Terry, if you have not used it yet and/or are not aware of it, take a look at 
?Deprecated in base:

   https://stat.ethz.ch/R-manual/R-devel/library/base/html/Deprecated.html

which is helpful in setting up a deprecation process. If you Google "deprecating 
functions in R", there are numero

[Rd] survival changes

2019-06-01 Thread Therneau, Terry M., Ph.D. via R-devel

In the next version of the survival package I intend to make a non-upwardly 
compatable 
change to the survfit object.  With over 600 dependent packages this is not 
something to 
take lightly, and I am currently undecided about the best way to go about it.  
I'm looking 
for advice.

The change: 20+ years ago I had decided not to include the initial x=0,y=1 data 
point in 
the survfit object itself.  It was not formally an estimand and the 
plot/points/lines etc 
routines could add this on themselves.  That turns out to have been a mistake, 
and has led 
to a steady proliferation of extra bits as I realized that the time axis 
doesn't always 
start at 0, and later (with multi state) that y does not always start at 1 
(though the 
states sum to 1), and later the the error doesn't always start at 0, and 
another 
realization with cumulative hazard, and ...
The new survfit method for multi-state coxph models was going to add yet 
another special 
case.  Basically every component is turning into a duplicate of "row 1" vs "all 
the 
others".  (And inconsistently named.)

Three possible solutions
1. Current working draft of survival_3.0.3:  Add a 'version' element to the 
survfit object 
and a 'survfit2.3' function that converts old to new.  All my downstream 
functions (print, 
plot,...) start with an "if (old) update to new" line.  This has allowed me to 
stage 
updates to the functions that create survfit objects -- I expect it to happen 
slowly.  
There will also be a survfit3.2 function to go backwards. Both the forward and 
backwards 
functions leave objects alone if they are currently in the desired format.

2. Make a new class "survfit3" and the necessary 'as' functions. The package 
would contain 
plot.survfit and plot.survfit3 methods, the former a two line "convert and call 
the 
second" function.

3. Something I haven't thought of.

Number 2 has a cleanness about it, but there is a long term nuisance about it 
wrt 
documentation.  Users, not unreasonably, expect the survfit function to produce 
a survfit 
object, and that is what they look for in the help pages.

I plan to have 3.0-x on github before userR so that users can begin to play 
with it (and 
to get feeback before pushing to CRAN), so need to make a decision.

Terry T.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] all.equal failure and [.terms

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

The all.equal was a side issue for me; I don't have strong opinions one way or 
the other.  
You are welcome to leave me out of the loop on that.  (Or leave me on the cc, 
whatever is 
easiest).

  I will update the survival package once the [.terms issues are addressed.

  One debatable issues is the choice of change vs document for the offset() 
issue.  With 
my proposed fix or without it, offsets are completely ignored by [.terms and 
dropterms.  
So with a formula of

    z <- terms(y~ x1 + offset(x2) + x3)

the 2 in drop.terms(z, 2) or z[-2] refers to x3, and the result will drop both 
the offset 
and x3.  For the use cases that I can think of the two functions are used at 
the 'build 
the X matrix' stage, offsets have already been accounted for, and the present 
behavior is 
fine.  My vote would be to document it with a few lines in the help file since 
that is the 
easiest.  Offsets don't count as a 'term' in the assign attribute either so the 
current 
behavior is consistent in that respect.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Re: all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

On 4/5/19 9:39 AM, Duncan Murdoch wrote:

On 05/04/2019 10:19 a.m., Therneau, Terry M., Ph.D. wrote:

Duncan,
   I should have included it in my original note, but

      all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at 
all.equal.default shows the following line right near the top:

 if (is.language(target) || is.function(target))
 return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.

Which R version are you using?  I see deparse(target) and deparse(current) in 
all.equal.language(), and those should not be ignoring attributes according to the 
documentation.

I'm using today's version of R-devel on Ubuntu.  (svn up this AM)
But I agree, both target and current appear.

Duncan Murdoch

I'll change my original original title to "all.equal was not a good tool for testing 
certain code issues".

Thanks for the pointer,

Terry

On 4/5/19 9:00 AM, Duncan Murdoch wrote:

On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

  > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

  > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"

As documented, all.equal() is generic, with methods for different classes.  The 
classes of both t0x and t1x are

 c("terms","formula")

with no all.equal.terms method, so all.equal.formula is called. That method isn't 
specifically documented, but you can see its definition as

function (target, current, ...)
{
    if (length(target) != length(current))
    return(paste0("target, current differ in having response: ",
    length(target) == 3L, ", ", length(current) == 3L))
    if (!identical(deparse(target), deparse(current)))
    "formulas differ in contents"
    else TRUE
}

So the issue is that deparse(t0x) and deparse(t1x) give the same strings with no 
attributes shown, even though "showAttributes" is set by default.   I haven't traced 
through the C code to see where things are going wrong.

Duncan Murdoch

  > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

Duncan,
   I should have included it in my original note, but

      all.equal(unclass(t0x), unclass(t1x))

returns TRUE as well.  I had tried that as well.   But a further look at 
all.equal.default 
shows the following line right near the top:
     if (is.language(target) || is.function(target))
     return(all.equal.language(target, current, ...))

and that path explicitly ignores attributes.

I'll change my original original title to "all.equal was not a good tool for 
testing 
certain code issues".

Thanks for the pointer,

Terry



On 4/5/19 9:00 AM, Duncan Murdoch wrote:
> On 05/04/2019 9:03 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
>> This arose in testing [.terms and has me confused.
>>
>> data(esoph)   # use a standard data set
>>
>> t0x <- terms(model.frame( ~ tobgp, data=esoph))
>> t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
>> t1x <- (delete.response(t1))[-1]
>>
>>   > all.equal(t0x, t1x)
>> [1] TRUE
>>
>> # the above is wrong, because they actually are not the same
>>
>>   > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
>> [1] "Names: 1 string mismatch"
>> [2] "Lengths (1, 2) differ (string compare on first 1)"
>
> As documented, all.equal() is generic, with methods for different classes.  
> The classes 
> of both t0x and t1x are
>
>  c("terms","formula")
>
> with no all.equal.terms method, so all.equal.formula is called. That method 
> isn't 
> specifically documented, but you can see its definition as
>
> function (target, current, ...)
> {
>     if (length(target) != length(current))
>     return(paste0("target, current differ in having response: ",
>     length(target) == 3L, ", ", length(current) == 3L))
>     if (!identical(deparse(target), deparse(current)))
>     "formulas differ in contents"
>     else TRUE
> }
>
> So the issue is that deparse(t0x) and deparse(t1x) give the same strings with 
> no 
> attributes shown, even though "showAttributes" is set by default.   I haven't 
> traced 
> through the C code to see where things are going wrong.
>
> Duncan Murdoch
>
>>
>>   > sessionInfo()
>> R Under development (unstable) (2019-04-05 r76323)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 18.04.2 LTS
>>
>> Matrix products: default
>> BLAS:   /usr/local/src/R-devel/lib/libRblas.so
>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>    [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
>>    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>    [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>    [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.7.0 tools_3.7.0
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [.terms issue

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

As  footnote, the error in survival:::untangle.specials is that it assumed that 
if 
attr(myterms, 'specials')[['strata']] was = to 4, then one could use 
myterms[-4] to remove 
the strata term.   Not so.   In the model  y ~  x1 + x2 + strata(x3)  the 
attritube will 
be 4 -- the response counts --- but to remove it I need to use [-3] since the 
response 
does not count in [.terms or drop.terms.

Is this an inconsistency that should be documented and/or repaired?   What 
would break if 
we did?    I don't know.  By chance, all the usages in the survival package 
happened after 
a call to delete.response so it would be immune to such a change.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] all.equal failure

2019-04-05 Thread Therneau, Terry M., Ph.D. via R-devel

This arose in testing [.terms and has me confused.

data(esoph)   # use a standard data set

t0x <- terms(model.frame( ~ tobgp, data=esoph))
t1 <-  terms(model.frame(ncases ~ agegp + tobgp, data=esoph))
t1x <- (delete.response(t1))[-1]

 > all.equal(t0x, t1x)
[1] TRUE

# the above is wrong, because they actually are not the same

 > all.equal(attr(t0x, 'dataClasses'), attr(t1x, 'dataClasses'))
[1] "Names: 1 string mismatch"
[2] "Lengths (1, 2) differ (string compare on first 1)"

 > sessionInfo()
R Under development (unstable) (2019-04-05 r76323)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.7.0 tools_3.7.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] subscripting a terms object

2019-04-04 Thread Therneau, Terry M., Ph.D. via R-devel

Someone sent me a bug report for survival2.44.1-1 that involves a model with 
both cluster 
and offset.  It turns out to be a 3 part issue with [.terms and my own 
untangle.specials 
routine.   I've spent an evening sorting out the details.


   1. The delete.response() function doesn't remove the response from the 
dataClasses 
attribute, which leads to a later failure in [.terms for no-response models.  
Is there a 
reason for this, or can I make my patch include this oversight as well?

  2. [.terms messes up predvars and dataClasses if the model has an offset term 
in it.  
(In select cases 1 and 2 can cancel out and give the correct dataClasses 
attribute.)

  3. The survival::untangle.specials routine assumed that you can use the same 
subscripting for the terms of a model and the term() object itself, which turns 
out to be 
almost always true, but only almost.

   The failure turns out to have probably been there since the Splus days, 
which tells one 
just how often such a model is used. (One of two edge case bugs sent to me in 
the first 
days after I pushed it to CRAN: a new release seems to attact them.)   I'm 
willing to put 
together a patch, but given the rarity of these would folks prefer to wait 
until after the 
April release?   I'm fine with that.  I need the answer to 1 though.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: issue with latest release of R-devel

2019-03-28 Thread Therneau, Terry M., Ph.D. via R-devel

I have not yet checked all instances, but Henrik's suggestion is 3 for 3 so far.
Should I send notes to the packages in question, or will they get some from 
CRAN?

Terry


On 3/27/19 10:44 PM, Henrik Bengtsson wrote:
> Could this be related to
>
> "SIGNIFICANT USER-VISIBLE CHANGES
>
> The default method for generating from a discrete uniform distribution
> (used in sample(), for instance) has been changed. This addresses the
> fact, pointed out by Ottoboni and Stark, that the previous method made
> sample() noticeably non-uniform on large populations. See PR#17494 for
> a discussion. The previous method can be requested using RNGkind() or
> RNGversion() if necessary for reproduction of old results. Thanks to
> Duncan Murdoch for contributing the patch and Gabe Becker for further
> assistance."
>
> If so, testing with
>
> export _R_RNG_VERSION_=3.5.0
>
> might remove/explain those errors.
>
> Just a thought
>
> Henrik
>
> On Wed, Mar 27, 2019 at 8:16 PM Therneau, Terry M., Ph.D. via R-devel
>  wrote:
>> I'm getting ready to submit an update of survival, and is my habit I run the 
>> checks on all
>> packages that depend/import/suggest  survival.  I am getting some very odd 
>> behaviour wrt
>> non-reproducability.  It came to a head when some things failed on one 
>> machine and worked
>> on another.   I found that the difference was that the failure was using the 
>> 3/27 release
>> and the success was still on a late Jan release.   When I updated R on the 
>> latter machine
>> it now fails too.
>>
>> An example is the test cases in genfrail.Rd, in the frailtySurv package.   
>> (The package
>> depends on survival, but I'm fairly sure that this function does not.)   
>> It's a fairly
>> simple function to generate test data sets, with a half dozen calls in the 
>> test file.  If
>> you cut and paste the whole batch into an R session, the last one of them 
>> fails.  But if
>> you run that call by itself it works.   This yes/no behavior is reproducable.
>>
>> Another puzzler was the ranger package.  In the tests/testthat directory,
>> source('test_maxstat') fails if it is preceeded by source('test_jackknife'), 
>> but not
>> otherwise.  Again, I don't think the survival package is implicated in 
>> either of these tests.
>>
>> Another package that succeeded under the older r-devel and now fails is 
>> arsenal, but I
>> haven't looked deeply at that.
>>
>> Any insight would be be appreciated.
>>
>> Terry T.
>> 
>>
>>
>> Here is the sessionInfo() for one of the machines.  The other is running 
>> xubuntu 18 LTS.
>> (It's at the office, and I can send that tomorrow when I get in.)
>>
>> R Under development (unstable) (2019-03-28 r76277)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 16.04.6 LTS
>>
>> Matrix products: default
>> BLAS:   /usr/local/src/R-devel/lib/libRblas.so
>> LAPACK: /usr/local/src/R-devel/lib/libRlapack.so
>>
>> locale:
>>[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>>[3] LC_TIME=en_US.UTF-8LC_COLLATE=C
>>[5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>>[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>>[9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.0
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] default for 'signif.stars'

2019-03-28 Thread Therneau, Terry M., Ph.D. via R-devel

The addition of significant stars was, in my opinion, one of the worst defaults 
ever added 
to R.   I would be delighted to see it removed, or at least change the default. 
 It is one 
of the few overrides that I have argued to add to our site-wide defaults file.

My bias comes from 30+ years in a medical statistics career where fighting the 
disease of 
"dichotomania" has been an eternal struggle.  Continuous covariates are split 
in two, 
nuanced risk scores are thresholded, decisions become yes/no,     Adding 
stars to 
output is, to me, simply a gateway drug to this pernicous addiction.   We 
shouldn't 
encourage it.

Wrt Abe's rant about the Nature article:  I've read the article and found it to 
be well 
reasoned, and I can't say the same about the rant.   The issue in biomedical 
science is 
that the p-value has fallen victim to Goodhart's law: "When a measure becomes a 
target, it 
ceases to be a good measure."  The article argues, and I would agree, that the 
.05 yes/no 
decision rule is currently doing more harm than good in biomedical research.   
What to do 
instead of this is a tough question, but it is fairly clear that the current 
plan isn't 
working.   I have seen many cases of two papers which both found a risk 
increase of 1.9 
for something where one paper claimed "smoking gun" and the other "completely 
exonerated".   Do YOU want to take a drug with 2x risk and a p= 0.2 'proof' 
that it is 
okay?   Of course, if there is too much to do and too little time, people will 
find a way 
to create a shortcut yes/no rule no matter what we preach.   (We statisticians 
will do it 
too.)

Terry T.




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] issue with latest release of R-devel

2019-03-27 Thread Therneau, Terry M., Ph.D. via R-devel

I'm getting ready to submit an update of survival, and is my habit I run the 
checks on all 
packages that depend/import/suggest  survival.  I am getting some very odd 
behaviour wrt 
non-reproducability.  It came to a head when some things failed on one machine 
and worked 
on another.   I found that the difference was that the failure was using the 
3/27 release 
and the success was still on a late Jan release.   When I updated R on the 
latter machine 
it now fails too.

An example is the test cases in genfrail.Rd, in the frailtySurv package.   (The 
package 
depends on survival, but I'm fairly sure that this function does not.)   It's a 
fairly 
simple function to generate test data sets, with a half dozen calls in the test 
file.  If 
you cut and paste the whole batch into an R session, the last one of them 
fails.  But if 
you run that call by itself it works.   This yes/no behavior is reproducable.

Another puzzler was the ranger package.  In the tests/testthat directory,  
source('test_maxstat') fails if it is preceeded by source('test_jackknife'), 
but not 
otherwise.  Again, I don't think the survival package is implicated in either 
of these tests.

Another package that succeeded under the older r-devel and now fails is 
arsenal, but I 
haven't looked deeply at that.

Any insight would be be appreciated.

Terry T.



Here is the sessionInfo() for one of the machines.  The other is running 
xubuntu 18 LTS.  
(It's at the office, and I can send that tomorrow when I get in.)

R Under development (unstable) (2019-03-28 r76277)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/local/src/R-devel/lib/libRblas.so
LAPACK: /usr/local/src/R-devel/lib/libRlapack.so

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=C
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] compiler_3.6.0


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] as.data.frame.table() does not recognize default.stringsAsFactors()

2019-03-15 Thread Therneau, Terry M., Ph.D. via R-devel

I have to disagree with both Peter and Martin on this.

The underneath issue is that the automatic conversion of characters to factors 
by the 
data.frame functions was the single most egregious design blunder in the 
Statistical 
Models in S book, and we are still living with it.  The stringsAsFactors option 
was a 
compromise to let users opt out of that mistake (one I had to fight hard for).  
  In that 
light I read Peter's defense as "but in this case we really DO know better than 
the user, 
and won't let them opt out", and Martin's as "they shouldn't have been able to 
opt out in 
the first place, so weaken it at every opportunity".

I generally agree that global options should be minimal.  But if one exists, 
let's be 
consistent and listen to it.

(Footnote: In the Mayo Biostat group, stringsAsFactors=FALSE is the recommended 
global 
option for all users.  It's a pure cost/productivity thing.  We work on 
thousands of data 
sets in a year, and the errors and misunderstandings that silent conversions 
generate far 
outweigh any benefits. )

Terry T.

On 3/15/19 6:00 AM, r-devel-requ...@r-project.org wrote:
>  > I have no recollection of the original rationale for 
> as.data.frame.table, but I actually think it is fine as it is:
>  > The classifying_factors_  of a crosstable should be factors unless 
> very specifically directed otherwise and that should not depend on the 
> setting of an option that controls the conversion of character data.
>
>  > For as.data.frame.matrix, in contrast, it is the_content_  of the 
> matrix that is being converted, and it seems much more reasonable to follow 
> the same path as for other character data.
>
>  > -pd
>
> I very strongly agree that as.data.frame.table() should not be
> changed to follow a global option.
>
> To the contrary: I've repeatedly mentioned that in my view it
> has been a design mistake to allow data.frame() and as.data.frame() be 
> influenced
> by a global option
>   [and we should've tried harder to keep things purely functional
> (R remaining as closely as possible a "functional language"),
>e.g. by providing wrapper functions the same way we have such
>wrappers for versions of read.table() with different defaults
>for some of the arguments
>   ]

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Hex sticker

2019-01-22 Thread Therneau, Terry M., Ph.D. via R-devel

Is there a canonical place to add a hex sticker to a package?    I've found use 
of 
man/figures and inst/.
A nice sticker has been made for survival and since it is a required package I 
don't want 
to mess it up.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] issue with testInstalledPackage

2018-11-26 Thread Therneau, Terry M., Ph.D. via R-devel

Background: I run tools::testInstalledPackage on all packages that dependend on 
survival 
(605 as of today) before sending a new release to CRAN. It has a few false 
positives which 
I then follow up on.  (Mostly packages with as-yet-incomplete tests in their 
inst directory).

  Issue: testInstalledPackage("mets")  generates an  "Error in 
checkVignettes(pkg, 
lib.loc, latex = FALSE, weave = TRUE)" message, which stops my script.  The 
source code 
for that package passes R CMD check.
I can easily work around this in the short term by adding mets to my 
do-not-check list.

Footnote: the newer "check_packages_in_dir" routine doesn't work for me.   The 
biggest 
reason is that it doesn't have a way to give a list of packages to skip.  My 
little 
desktop box doesn't have every linux library (cryptography, geospatial, etc.), 
nor do I 
load up bioconductor; which leads to a boatload of false positives.  I keep 
adding things 
but the packages add faster.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] invisible functions

2018-10-16 Thread Therneau, Terry M., Ph.D. via R-devel

The survival package, like many others, has several helper functions that are 
not declared 
in the namespace, since their only use is to be called by other "main" 
functions of the 
package.  This works well since the functions in the survival namespace can see 
them --- 
without ::: arguments --- and others don't.

Until a situation I ran into this week, for which I solicit comments or advice. 
  The 
concordance function is a new addition, and it has one case where the same 
underlying 
helper function is called multiple times, with many arguments passed through 
from the 
parent.  I thought that this would be a good use for the trick we use for 
model.frame, so 
I have code like this:

concordance.coxph <- function(fit, ..., newdata, group, ymin, ymax,
    timewt=c("n", "S", "S/G", "n/G", "n/G2"),
    influence=0, ranks=FALSE, timefix=TRUE) {
     Call <- match.call()
     .
     .
     .
     cargs <- c("ymin", "ymax","influence", "ranks", "timewt", "timefix")
     cfun <- Call[c(1, match(cargs, names(Call), nomatch=0))]
     cfun[[1]] <- quote(cord.work)
     cfun$reverse <- TRUE
     rval <- eval(cfun, parent.frame())

This worked fine in my not-in-a-namespace test bed, but then fails when 
packaged up for 
real: the code can't find the helper function cord.work!  The rule that 
survival package 
functions can "see" their undeclared helpers fails.

I got it working by changing parent.frame() to environment(concordance) in the 
eval() 
call.   Since everything used by cord.work is explicitly passed in its argument 
list this 
does work.

Comments or suggestions?   (I avoid having survival:: in the survival package 
because it 
messes up my particular test bed.)

Terry


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] maximum matrix size

2018-10-03 Thread Therneau, Terry M., Ph.D. via R-devel

That is indeed helpful; reading the sections around it largely answered my 
questions. 
Rinternals.h has the definitions

#define allocMatrix Rf_allocMatrix
SEXP Rf_allocMatrix(SEXPTYPE, int, int);
#define allocVector        Rf_allocVector
SEXP Rf_allocVector(SEXPTYPE, R_xlen_t);

Which answers the further question of what to expect inside C routines invoked 
by Call.

It looks like the internal C routines for coxph work on large matrices by pure 
serendipity 
(nrow and ncol each less than 2^31 but with the product  > 2^31), but 
residuals.coxph 
fails with an allocation error on the same data.  A slight change and it could 
just as 
easily have led to a hard crash.    Sigh...   I'll need to do a complete code 
review.   
I've been converting .C routines to .Call  as convenient, this will force 
conversion of 
many of the rest as a side effect (20 done, 23 to go).  As a statsitician my 
overall 
response is "haven't they ever heard of sampling"?  But as I said earlier, it 
isn't just 
one user.

Terry T.

On 10/02/2018 12:22 PM, Peter Langfelder wrote:
> Does this help a little?
>
> https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Long-vectors
>
> One thing I seem to remember but cannot find a reference for is that
> long vectors can only be passed to .Call calls, not C/Fortran. I
> remember rewriting .C() in my WGCNA package to .Call for this very
> reason but perhaps the restriction has been removed.
>
> Peter
> On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel
>  wrote:
>> I am now getting the occasional complaint about survival routines that are 
>> not able to
>> handle big data.   I looked in the manuals to try and update my 
>> understanding of max
>> vector size, max matrix, max data set, etc; but it is either not there or I 
>> missed it (the
>> latter more likely).   Is it still .Machine$integer.max for everything?   
>> Will that
>> change?   Found where?
>>
>> I am going to need to go through the survival package and put specific 
>> checks in front
>> some or all of my .Call() statements, in order to give a sensible message 
>> whenever a
>> bounday is struck.  A well meaning person just posted a suggested "bug fix" 
>> to the github
>> source of one routine where my .C call allocates a scratch vector, 
>> suggesting  "resid =
>> double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" 
>> message,  in
>> the code below.   A fix is obvously not quite that easy :-)
>>
>>   resid <- .C(Ccoxscore, as.integer(n),
>>   as.integer(nvar),
>>   as.double(y),
>>   x=as.double(x),
>>   as.integer(newstrat),
>>   as.double(score),
>>   as.double(weights[ord]),
>>   as.integer(method=='efron'),
>>   resid= double(n*nvar),
>>   double(2*nvar))$resid
>>
>> Terry T.
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] maximum matrix size

2018-10-02 Thread Therneau, Terry M., Ph.D. via R-devel

I am now getting the occasional complaint about survival routines that are not 
able to 
handle big data.   I looked in the manuals to try and update my understanding 
of max 
vector size, max matrix, max data set, etc; but it is either not there or I 
missed it (the 
latter more likely).   Is it still .Machine$integer.max for everything?   Will 
that 
change?   Found where?

I am going to need to go through the survival package and put specific checks 
in front 
some or all of my .Call() statements, in order to give a sensible message 
whenever a 
bounday is struck.  A well meaning person just posted a suggested "bug fix" to 
the github 
source of one routine where my .C call allocates a scratch vector, suggesting  
"resid = 
double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" 
message,  in 
the code below.   A fix is obvously not quite that easy :-)

         resid <- .C(Ccoxscore, as.integer(n),
                 as.integer(nvar),
                 as.double(y),
                 x=as.double(x),
                 as.integer(newstrat),
                 as.double(score),
                 as.double(weights[ord]),
                 as.integer(method=='efron'),
                 resid= double(n*nvar),
                 double(2*nvar))$resid

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] new behavior in model.response -- Solved

2018-06-28 Thread Therneau, Terry M., Ph.D. via R-devel

Thanks to multiple readers for comments and patience as I sorted this out.  I 
now have 
working length and names methods for Surv objects, which do not seem to break 
anything.   
I just ran the test suites for 471 packages that depend on survival, but I 
don't test 
against bioconductor so cannot speak to that corpus.

Essentially, if I wanted to think of  Surv(1:6, rep(1:0, 3)) = 1 2+ 3  4+ 5  6+ 
as a 
vector of length 6, although stored in a longer representation, I needed to
   a. add a length method
   b. add names and names<- methods
   c. But, the names method cannot create a length=6 names attribute.  Handling 
of a names 
attribute is baked into the low-level code, and that code treats number of 
elements as the 
length, no matter what you say.
   d. The solution was to make names.Surv store the results in the rownames.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] new behavior in model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel

I now understand the issue, which leads to a different and deeper issue which 
is "how to 
assign a proper length to Surv objects".

 > Surv(c(1,2,3), c(1,0,1))
[1] 1  2+ 3

The above prints as 3 elements and is conceptually 3 elements. But if I give it 
length 
method to return a 3 then I need a names method, and names<-  pays no attention 
to my 
defined length. How do we conceive of and manage a vector whose elements happen 
to require 
more than one storage slot for their representation?  An obvious example is the 
complex 
type, but it seems that had to be baked right down into the core.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] new behavior of model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel

Charles Berry pointed out an error in my reasoning.   In the current survival I 
forgot the 
S3method line for length in the NAMESPACE file, so the behavior is really not 
new.  
Nonetheless it remains surprising and non-intuitive.  Why does model.response 
sometimes 
attach spurious names, when the Surv object itself does not have them?

Terry


tmt% R --vanilla
R version 3.4.2 (2017-09-28) -- "Short Summer"

test <- data.frame(time=1:8, status=rep(0:1, 4), age=60:67)
row.names(test) <- letters[1:8]

library(survival)

mf2 <- model.frame(Surv(time, status) ~ age, data=test)
names(mf2[[1]])
# NULL
names(model.response(mf2))
# NULL

length.Surv <- survival:::length.Surv

names(model.response(mf2))
  # [1] "a" "b" "c" "d" "e" "f" "g" "h" NA  NA  NA  NA  NA  NA  NA NA


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] new behavior of model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel

I am getting some unexplained changes in the latest version of survival, and 
finally 
traced it down to this: model.response acts differently for Surv objects.
Here is a closed form example using a made up class Durv = diagnose survival.   
I tracked 
it down by removing methods one by one from Surv; I had just added some new 
ones so they 
were my suspects.

test <- data.frame(time=1:8, status=rep(0:1, 4), age=60:67)
row.names(test) <- letters[1:8]

Durv <- function(...) {
     temp <- cbind(...)
     class(temp) <- "Durv"
     temp
}
mf1 <- model.frame(Durv(time, status) ~ age, data=test)
names(model.response(mf1))
#  NULL

length.Durv <- function(x) nrow(x)
names(model.response(mf1))
#  [1] "a" "b" "c" "d" "e" "f" "g" "h" NA  NA  NA  NA  NA  NA NA  NA

The length method for Surv objects has been around for some while, this 
behavior is new.  
It caused the 'time' component of survfit objects to suddenly have names and so 
was 
discovered in my test suite.  I had planned to submit an update today, but now 
need to 
delay it.

The length of the Surv (Durv) object above is 8, BTW; the fact that it's 
representation 
requires either 16 elements (right censored) or 24 (interval censored) is a 
footnote.

Terry Therneau

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: list of methods

2018-06-26 Thread Therneau, Terry M., Ph.D. via R-devel

In a Surv object length() returns number of survival times contained in the 
object, which 
is the logical thing to do.  The fact that a survival time needs 2 or 3 values 
to 
represent is simply a nuisance.   Thus length(unclass(x)) = amount of memory 
consumed is 
not going to map to sensible operations.

Yes, I'll have to go through the list by hand.  Some methods will only apply to 
transform 
functions (makepredictcall.pspline), others to model fits (extractAIC.survreg) 
and others 
to vectors (head.Surv).   I just need to slog through it.

Looking at Date is a good idea, but it wouldn't have revealed head  or tail.  
It did point 
out addition of c() and rep() methods.

Terry

On 06/26/2018 01:40 PM, Michael Lawrence wrote:
> While it's easy to conceive of a utility that found all generics for
> which there is no non-default method for a given class vector, it's
> not clear it would be useful, because it depends on the nature of the
> object. Surv objects are vector-like, so they need to implement the
> "vector API", which is not formally defined. You could look at the
> S4Vectors package or the date/time classes for reference. But Surv
> gets a lot less for free since length() returns twice their logical
> length, an unfortunate inconsistency.
>
> Michael
>
>
>
> On Tue, Jun 26, 2018 at 11:24 AM, Therneau, Terry M., Ph.D. via
> R-devel  wrote:
>> I recently got a request to add head() and tail() methods for Surv objects, 
>> which is quite
>> reasonable, but not unlike other requests for logLik,  vcov, extractAIC, ... 
>>   What they
>> all have in common is that are methods added since creation of the survival 
>> package, and
>> that I didn't know they existed.
>>
>> To try and get ahead of the curve, is there a way to list names of all of 
>> the default
>> methods?   There are functions to get all the instances of a method by name, 
>> e.g.
>> methods("extractAIC") or find all the methods already implemented for a 
>> class, but I don't
>> see something give me a list of the ones that I haven't created yet.
>>
>> Terry T.
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] list of methods

2018-06-26 Thread Therneau, Terry M., Ph.D. via R-devel

I recently got a request to add head() and tail() methods for Surv objects, 
which is quite 
reasonable, but not unlike other requests for logLik,  vcov, extractAIC, ...   
What they 
all have in common is that are methods added since creation of the survival 
package, and 
that I didn't know they existed.

To try and get ahead of the curve, is there a way to list names of all of the 
default 
methods?   There are functions to get all the instances of a method by name, 
e.g. 
methods("extractAIC") or find all the methods already implemented for a class, 
but I don't 
see something give me a list of the ones that I haven't created yet.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] issue with model.frame()

2018-05-01 Thread Therneau, Terry M., Ph.D. via R-devel

I want to add that the priority for this is rather low, since we have a couple of work 
arounds for the user/data set in question.  I have some ideas about changing the way in 
which ridge() works, which might make the problem moot.  The important short-term result 
was finding that it wasn't an error of mine in the survival package. :-)


Add it to your "think about it" list.

Terry


On 05/01/2018 03:15 PM, Martin Maechler wrote:

What version of R is that ?  In current versions it is

 varnames <- vapply(vars, deparse2, " ")[-1L]

and deparse2() is a slightly enhanced version of the above
function, again with  'width.cutoff = 500'

*BUT*  if you read  help(deparse)  you will learn that 500 is the
upper bound allowed currently.  (and yes, one could consider
increasing that as it has been unchanged in R since the very
beginning (I have checked R version 0.49 from 1997).

On the other hand, deparse2 (and your older code above) do paste
all the parts together  via  collapse = " "  so I don't see
quite yet ...

Martin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: issue with model.frame()

2018-05-01 Thread Therneau, Terry M., Ph.D. via R-devel

Great catch.  I'm very reluctant to use my own model.frame, since that locks me into 
tracking all the base R changes, potentially breaking survival in a bad way if I miss one.


But, this shows me clearly what the issue is and will allow me to think about 
it.

Another solution for the user is to use multiple ridge() calls to break it up; since 
he/she was using a fixed tuning parameter the result is the same.


Terry T.


On 05/01/2018 11:43 AM, Berry, Charles wrote:




On May 1, 2018, at 6:11 AM, Therneau, Terry M., Ph.D. via R-devel 
<r-devel@r-project.org> wrote:

A user sent me an example where coxph fails, and the root of the failure is a 
case where names(mf) is not equal to the term.labels attribute of the formula 
-- the latter has an extraneous newline. Here is an example that does not use 
the survival library.

# first create a data set with many long names
n <- 30  # number of rows for the dummy data set
vname <- vector("character", 26)
for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='')  # long 
variable names

tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
names(tdata) <- c('y', vname)

# Use it in a formula
myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")")
mf <- model.frame(formula(myform), data=tdata)

match(attr(terms(mf), "term.labels"), names(mf))   # gives NA



In the user's case the function is ridge(x1, x2, ) rather than cbind, but 
the effect is the same.
Any ideas for a work around?


Maybe add a `yourclass' class to mf and dispatch to a model.frame.yourclass 
method where the width cutoff arg here (around lines 57-58 of 
model.frame.default) is made larger:

varnames <- sapply(vars, function(x) paste(deparse(x, width.cutoff = 500),
 collapse = " "))[-1L]

??



Aside: the ridge() function is very simple, it was added as an example to show 
how a user can add their own penalization to coxph.  I never expected serious 
use of it.  For this particular user the best answer is to use glmnet instead.  
 He/she is trying to apply an L2 penalty to a large number of SNP * covariate 
interactions.

Terry T.




HTH,

Chuck



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] issue with model.frame()

2018-05-01 Thread Therneau, Terry M., Ph.D. via R-devel

A user sent me an example where coxph fails, and the root of the failure is a case where 
names(mf) is not equal to the term.labels attribute of the formula -- the latter has an 
extraneous newline. Here is an example that does not use the survival library.


# first create a data set with many long names
n <- 30  # number of rows for the dummy data set
vname <- vector("character", 26)
for (i in 1:26) vname[i] <- paste(rep(letters[1:i],2), collapse='')  # long 
variable names

tdata <- data.frame(y=1:n, matrix(runif(n*26), nrow=n))
names(tdata) <- c('y', vname)

# Use it in a formula
myform <- paste("y ~ cbind(", paste(vname, collapse=", "), ")")
mf <- model.frame(formula(myform), data=tdata)

match(attr(terms(mf), "term.labels"), names(mf))   # gives NA



In the user's case the function is ridge(x1, x2, ) rather than cbind, but the effect 
is the same.

Any ideas for a work around?

Aside: the ridge() function is very simple, it was added as an example to show how a user 
can add their own penalization to coxph.  I never expected serious use of it.  For this 
particular user the best answer is to use glmnet instead.   He/she is trying to apply an 
L2 penalty to a large number of SNP * covariate interactions.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] strange warning: data() error?

2018-04-16 Thread Therneau, Terry M., Ph.D. via R-devel


A user asked me about this and I can't figure it out.

tmt% R
R Under development (unstable) (2018-04-09 r74565) -- "Unsuffered Consequences"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

> library(survival)
> data(cgd0)
Warning message:
In data(cgd0) : data set ‘cgd0’ not found



The data set is present and can be manipulated: data() is not required.  Other data sets 
in the survival package don't generate this message.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] survival updates and testing

2018-04-06 Thread Therneau, Terry M., Ph.D. via R-devel

I've run the latest version of survival through the test suites of 486 of the 565 packages 
that depend on it (Depends, Imports, LinkingTo, Suggests), and have a couple small issues 
that I'm taking to other authors about.
The exercise turned up a half dozen real errors in my package.  I plan to submit early 
next week.


This leads to three comments/questions.

1. Is there a submission site that would check all?  There's several that want to use C 
libraries that I don't have on my small test machine, or are just too large (rstan is an 
entire ecosystem).   It could save one submission cycle for the the CRAN maintainers.


2. The update.packages routine was not smart enough to update the packages that have C 
code, although every one of those needed to be re-installed in the newest R-devel.


3. This is more of an opinion.  One of the failures was due to the fact that one of the 
elements in the coxph object's list shifted position.  I did address this, but my sympathy 
level was low for a developer that counts on absolute order rather than a name.  Would you 
bend, or tell them to fix their package?


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [EXTERNAL] Re: Fwd: Re: Re: backquotes and term.labels

2018-03-08 Thread Therneau, Terry M., Ph.D. via R-devel


Ben,
I looked at the source code you pointed out, and the line
fr <- fr[attr(terms(fr),"varnames.fixed")]

sure looks to me as though the terms() function has returned an object with a 
varnames.fixed attribute.  Either that or your code has inside knowledge that a reader 
like me won't know.  Given what you said I will guess that terms(x) returns the terms 
attribute of x if that already exists, doing no processing, and you know that fr has 
already been thusly modified?


I have code that uses the variables, term.names, and factors attributes of the terms() 
structure, and all of those have the backticks.  I know the second two will currently 
break the coxme and survival packages, I haven't chased down the effect on the first.



On 03/08/2018 08:42 AM, Ben Bolker wrote:

Meant to respond to this but forgot.

  I didn't write a new terms() function  -- I added an attribute to the
terms() (a vector of the names
of the constructed model matrix), thus preserving the information at
the point when it was available.
   I do agree that it would be preferable to have an upstream fix ...



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Fwd: Re: [EXTERNAL] Re: backquotes and term.labels

2018-03-08 Thread Therneau, Terry M., Ph.D. via R-devel

Ben,

Looking at your notes, it appears that your solution is to write your own 
terms() function
for lme.  It is easy to verify that the "varnames.fixed" attribute is not 
returned by the
ususal terms function.

Then I also need to write my own terms function for the survival and coxme 
pacakges?
Because of the need to treat strata() terms in a special way I manipulate the
formula/terms in nearly every routine.

Extrapolating: every R package that tries to examine formulas and partition 
them into bits
needs its own terms function?  This does not look like a good solution to me.

On 03/07/2018 07:39 AM, Ben Bolker wrote:

I knew I had seen this before but couldn't previously remember where.
https://github.com/lme4/lme4/issues/441 ... I initially fixed with
gsub(), but (pushed by Martin Maechler to do better) I eventually
fixed it by storing the original names of the model frame (without
backticks) as an attribute for later retrieval:
https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.

On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
<r-devel@r-project.org> wrote:

Thanks to Bill Dunlap for the clarification.  On follow-up it turns out that
this will be an issue for many if not most of the routines in the survival
package: a lot of them look at the terms structure and make use of the
dimnames of attr(terms, 'factors'), which also keeps the unneeded
backquotes.  Others use the term.labels attribute.  To dodge this I will
need to create a fixterms() routine which I call at the top of every single
routine in the library.

Is there a chance for a fix at a higher level?

Terry T.

On 03/05/2018 03:55 PM, William Dunlap wrote:

I believe this has to do terms() making "term.labels" (hence the dimnames
of "factors")
with deparse(), so that the backquotes are included for non-syntactic
names.  The backquotes
are not in the column names of the input data.frame (nor model frame) so
you get a mismatch
when subscripting the data.frame or model.frame with elements of
terms()$term.labels.

I think you can avoid the problem by adding right after
  ll <- attr(Terms, "term.labels")
the line
  ll <- gsub("^`|`$", "", ll)

E.g.,

  > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y
z`=cos(1:5)+2)
  > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
  > m <- model.frame(Terms, data=d)
  > colnames(m)
[1] "y""log(`b$a$d`)" "x y z"
  > attr(Terms, "term.labels")
[1] "log(`b$a$d`)" "`x y z`"
  >   ll <- attr(Terms, "term.labels")
  > gsub("^`|`$", "", ll)
[1] "log(`b$a$d`)" "x y z"

It is a bit of a mess.

Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>

On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
<r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote:

 A user reported a problem with the survdiff function and the use of
variables that
 contain a space.  Here is a simple example.  The same issue occurs in
survfit for the
 same reason.

 lung2 <- lung
 names(lung2)[1] <- "in st"   # old name is inst
 survdiff(Surv(time, status) ~ `in st`, data=lung2)
 Error in `[.data.frame`(m, ll) : undefined columns selected

 In the body of the code the program want to send all of the right-hand
side variables
 forward to the strata() function.  The code looks more or less like
this, where m is
 the model frame

Terms <- terms(m)
index <- attr(Terms, "term.labels")
if (length(index) ==0)  X <- rep(1L, n)  # no coariates
else X <- strata(m[index])

 For the variable with a space in the name the term.label is "`in st`",
and the
 subscript fails.

 Is this intended behaviour or a bug?  The issue is that the name of
this column in the
 model frame does not have the backtics, while the terms structure does
have them.

 Terry T.

 __
 R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 <https://stat.ethz.ch/mailman/listinfo/r-devel>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] backquotes and term.labels

2018-03-07 Thread Therneau, Terry M., Ph.D. via R-devel

Thanks to Bill Dunlap for the clarification.  On follow-up it turns out that this will be 
an issue for many if not most of the routines in the survival package: a lot of them look 
at the terms structure and make use of the dimnames of attr(terms, 'factors'), which also 
keeps the unneeded backquotes.  Others use the term.labels attribute.  To dodge this I 
will need to create a fixterms() routine which I call at the top of every single routine 
in the library.

Is there a chance for a fix at a higher level?

Terry T.

On 03/05/2018 03:55 PM, William Dunlap wrote:

I believe this has to do terms() making "term.labels" (hence the dimnames of 
"factors")
with deparse(), so that the backquotes are included for non-syntactic names.  
The backquotes
are not in the column names of the input data.frame (nor model frame) so you 
get a mismatch
when subscripting the data.frame or model.frame with elements of 
terms()$term.labels.

I think you can avoid the problem by adding right after
     ll <- attr(Terms, "term.labels")
the line
     ll <- gsub("^`|`$", "", ll)

E.g.,

 > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x y 
z`=cos(1:5)+2)
 > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
 > m <- model.frame(Terms, data=d)
 > colnames(m)
[1] "y"            "log(`b$a$d`)" "x y z"
 > attr(Terms, "term.labels")
[1] "log(`b$a$d`)" "`x y z`"
 >   ll <- attr(Terms, "term.labels")
 > gsub("^`|`$", "", ll)
[1] "log(`b$a$d`)" "x y z"

It is a bit of a mess.

Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>

On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel 
<r-devel@r-project.org <mailto:r-devel@r-project.org>> wrote:

A user reported a problem with the survdiff function and the use of 
variables that
contain a space.  Here is a simple example.  The same issue occurs in 
survfit for the
same reason.

lung2 <- lung
names(lung2)[1] <- "in st"   # old name is inst
survdiff(Surv(time, status) ~ `in st`, data=lung2)
Error in `[.data.frame`(m, ll) : undefined columns selected

In the body of the code the program want to send all of the right-hand side 
variables
forward to the strata() function.  The code looks more or less like this, 
where m is
the model frame

   Terms <- terms(m)
   index <- attr(Terms, "term.labels")
   if (length(index) ==0)  X <- rep(1L, n)  # no coariates
   else X <- strata(m[index])

For the variable with a space in the name the term.label is "`in st`", and 
the
subscript fails.

Is this intended behaviour or a bug?  The issue is that the name of this 
column in the
model frame does not have the backtics, while the terms structure does have 
them.

Terry T.

__
R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
<https://stat.ethz.ch/mailman/listinfo/r-devel>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] backquotes and term.labels

2018-03-05 Thread Therneau, Terry M., Ph.D. via R-devel

A user reported a problem with the survdiff function and the use of variables that contain 
a space.  Here is a simple example.  The same issue occurs in survfit for the same reason.


lung2 <- lung
names(lung2)[1] <- "in st"   # old name is inst
survdiff(Surv(time, status) ~ `in st`, data=lung2)
Error in `[.data.frame`(m, ll) : undefined columns selected

In the body of the code the program want to send all of the right-hand side variables 
forward to the strata() function.  The code looks more or less like this, where m is the 
model frame


  Terms <- terms(m)
  index <- attr(Terms, "term.labels")
  if (length(index) ==0)  X <- rep(1L, n)  # no coariates
  else X <- strata(m[index])

For the variable with a space in the name the term.label is "`in st`", and the subscript 
fails.


Is this intended behaviour or a bug?  The issue is that the name of this column in the 
model frame does not have the backtics, while the terms structure does have them.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

73 matches

Mail list logo