Re: [Rd] new behavior in model.response -- Solved

2018-06-28 Thread Therneau, Terry M., Ph.D. via R-devel
Thanks to multiple readers for comments and patience as I sorted this out.  I 
now have 
working length and names methods for Surv objects, which do not seem to break 
anything.   
I just ran the test suites for 471 packages that depend on survival, but I 
don't test 
against bioconductor so cannot speak to that corpus.

Essentially, if I wanted to think of  Surv(1:6, rep(1:0, 3)) = 1 2+ 3  4+ 5  6+ 
as a 
vector of length 6, although stored in a longer representation, I needed to
   a. add a length method
   b. add names and names<- methods
   c. But, the names method cannot create a length=6 names attribute.  Handling 
of a names 
attribute is baked into the low-level code, and that code treats number of 
elements as the 
length, no matter what you say.
   d. The solution was to make names.Surv store the results in the rownames.

Terry T.


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] new behavior in model.response

2018-06-27 Thread Berry, Charles



> On Jun 27, 2018, at 3:58 PM, Achim Zeileis  wrote:
> 
> On Thu, 28 Jun 2018, Therneau, Terry M., Ph.D. via R-devel wrote:
> 
>> I now understand the issue, which leads to a different and deeper issue 
>> which is "how to assign a proper length to Surv objects".
>> 
>> > Surv(c(1,2,3), c(1,0,1))
>> [1] 1  2+ 3
>> 
>> The above prints as 3 elements and is conceptually 3 elements. But if I give 
>> it length method to return a 3 then I need a names method, and names<-  pays 
>> no attention to my defined length. How do we conceive of and manage a vector 
>> whose elements happen to require more than one storage slot for their 
>> representation?  An obvious example is the complex type, but it seems that 
>> had to be baked right down into the core.
> 
> I think you just have to implement all methods required to make it look like 
> a vector even if it is internally a matrix. Thus, you need methods for length 
> and for names and names<-. Internally, the names can be stored as row names.


I think a closer look at model.response() would help. 

IIUC, the reasoning therein is that comparing 

length(data[[1L]]) (aka `length(v)') 

to 

length(attr(data, "row.names")) (aka `nrows') 

is to decide whether names are sensibly assigned to `v'. I think for Surv 
objects they are not.

I suppose you could define

`names<-.Surv` <- function(...) NULL

but that seems so silly I think there must be a better way. 

I am low on coffee right now, but I wonder if having a non-exported version of 
model.response in the survival package would solve this without breaking 
anything else. 

Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] new behavior in model.response

2018-06-27 Thread Achim Zeileis

On Thu, 28 Jun 2018, Therneau, Terry M., Ph.D. via R-devel wrote:

I now understand the issue, which leads to a different and deeper issue 
which is "how to assign a proper length to Surv objects".


> Surv(c(1,2,3), c(1,0,1))
[1] 1  2+ 3

The above prints as 3 elements and is conceptually 3 elements. But if I 
give it length method to return a 3 then I need a names method, and 
names<-  pays no attention to my defined length. How do we conceive of 
and manage a vector whose elements happen to require more than one 
storage slot for their representation?  An obvious example is the 
complex type, but it seems that had to be baked right down into the 
core.


I think you just have to implement all methods required to make it look 
like a vector even if it is internally a matrix. Thus, you need methods 
for length and for names and names<-. Internally, the names can be stored 
as row names.


Further useful methods for "Surv" objects might include 
- as.double/as.integer (presumably just extracting the "time"),

- c
- str

Possibly also a dedicated summary.

An example for such a class is our "paircomp" in "psychotools". But I'm 
sure there are other/better examples elsewhere.


Best,
Achim
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] new behavior in model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel
I now understand the issue, which leads to a different and deeper issue which 
is "how to 
assign a proper length to Surv objects".

 > Surv(c(1,2,3), c(1,0,1))
[1] 1  2+ 3

The above prints as 3 elements and is conceptually 3 elements. But if I give it 
length 
method to return a 3 then I need a names method, and names<-  pays no attention 
to my 
defined length. How do we conceive of and manage a vector whose elements happen 
to require 
more than one storage slot for their representation?  An obvious example is the 
complex 
type, but it seems that had to be baked right down into the core.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] new behavior of model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel
Charles Berry pointed out an error in my reasoning.   In the current survival I 
forgot the 
S3method line for length in the NAMESPACE file, so the behavior is really not 
new.  
Nonetheless it remains surprising and non-intuitive.  Why does model.response 
sometimes 
attach spurious names, when the Surv object itself does not have them?

Terry


tmt% R --vanilla
R version 3.4.2 (2017-09-28) -- "Short Summer"

test <- data.frame(time=1:8, status=rep(0:1, 4), age=60:67)
row.names(test) <- letters[1:8]

library(survival)

mf2 <- model.frame(Surv(time, status) ~ age, data=test)
names(mf2[[1]])
# NULL
names(model.response(mf2))
# NULL

length.Surv <- survival:::length.Surv

names(model.response(mf2))
  # [1] "a" "b" "c" "d" "e" "f" "g" "h" NA  NA  NA  NA  NA  NA  NA NA


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] new behavior of model.response

2018-06-27 Thread Therneau, Terry M., Ph.D. via R-devel
I am getting some unexplained changes in the latest version of survival, and 
finally 
traced it down to this: model.response acts differently for Surv objects.
Here is a closed form example using a made up class Durv = diagnose survival.   
I tracked 
it down by removing methods one by one from Surv; I had just added some new 
ones so they 
were my suspects.

test <- data.frame(time=1:8, status=rep(0:1, 4), age=60:67)
row.names(test) <- letters[1:8]

Durv <- function(...) {
     temp <- cbind(...)
     class(temp) <- "Durv"
     temp
}
mf1 <- model.frame(Durv(time, status) ~ age, data=test)
names(model.response(mf1))
#  NULL

length.Durv <- function(x) nrow(x)
names(model.response(mf1))
#  [1] "a" "b" "c" "d" "e" "f" "g" "h" NA  NA  NA  NA  NA  NA NA  NA

The length method for Surv objects has been around for some while, this 
behavior is new.  
It caused the 'time' component of survfit objects to suddenly have names and so 
was 
discovered in my test suite.  I had planned to submit an update today, but now 
need to 
delay it.

The length of the Surv (Durv) object above is 8, BTW; the fact that it's 
representation 
requires either 16 elements (right censored) or 24 (interval censored) is a 
footnote.

Terry Therneau

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel