Martin,

   There are a couple of issues with [.terms that have bitten my survival code. 
 At the 
useR conference I promised you a detailed (readable) explanation, and have been 
lax in 
getting it to you. The error was first pointed out in a bugzilla note from 
2016, by the 
way.  The current survival code works around these.

Consider the following formula:

<<testform>>=
library(survival)  # only to get access to the lung data set
test <- Surv(time, status) ~  age + offset(ph.ecog) + strata(inst)
tform <- terms(test, specials="strata")
mf <- model.frame(tform, data=lung)
mterm <- terms(mf)
@

The strata term is handled in a special way by coxph, and then needs to be 
removed from 
the model formula before calling model.matrix.
To do this the code uses essentially the following, which fails for the formula 
above.

<<strata>>=
strata <- attr(mterm, "specials")$strata - attr(mterm, "response")
X <- model.matrix(mterm[-strata], mf)
@

The root problem is the need for multiple subscripts.
\begin{itemize}
   \item The formula itself has length 5, with `~' as the first element
   \item The variables and predvars attributes are call objects, each a list() 
with 4 
elments: the response and all 3 predictors
   \item The term.labels attribute omits the resonse and the offset, so has  
length 2
   \item The factors attribute has 4 rows and 2 columns
   \item The dataClasses attribute is a character vector of length 4
\end{itemize}

So the ideal result of  mterm[remove the specials] would use subscript of
\begin{itemize}
   \item [-5] on the formula itself, variables and predvars attributes
   \item [-2] for term.labels
   \item [-4 , -2, drop=FALSE] for factor attribute
   \item [-2] for order attribute
   \item [-4] for the dataClasses attribute
\end{itemize}

That will recreate the formula that ``would have been'' had there been no 
strata term.  
Now look at the first portion of the code in models.R
<<>>=
`[.terms` <- function (termobj, i)
{
     resp <- if (attr(termobj, "response")) termobj[[2L]]
     newformula <- attr(termobj, "term.labels")[i]
     if (length(newformula) == 0L) newformula <- "1"
     newformula <- reformulate(newformula, resp, attr(termobj, "intercept"), 
environment(termobj))
     result <- terms(newformula, specials = names(attr(termobj, "specials")))

     # Edit the optional attributes
}
@

The use of reformulate() is a nice trick.  However, the index reported in the 
specials 
attribute is generated with reference to the variables
attribute, or equivalently the row names of the factors attribute, not with 
respect to the 
term.labels attribute. For consistency the second line should instead be
<<>>=
newformula <- row.names(attr(termobj, "factors"))[i]
@

Of course, this will break code for anyone else who has used [.terms and, like 
me, has 
been adjusting for the ``response is counted in specials but
not in term.labels'' feature.  R core will have to discuss/decide what is the 
right thing 
to do, and I'll adapt.

The reformulate trick breaks in another way, one that only appeared on my radar 
this week 
via a formula like the following.

<<form2>>=
Surv(time, status) ~ age + (sex=='male') + strata(inst)
@

In both the term.labels attribute and the row/col names of the factors 
attribute the 
parentheses disappear, and the result of the reformulate call is not a proper 
formula.  
The + binds tighter than == leading to an error message that will confuse most 
users. We 
can argue, and I probably would, that the user should have used I(sex=='male'). 
 But they 
didn't, and without the I() it is a legal formula, or at least one that 
currently works.  
Fixing this issue is a lot harder.

An offset term causes issues in the 'Edit the optional attributes' part of the 
routine as 
well.  If you and/or R core will tell me what you think
the code should do, I'll create a patch.  My vote would be to use rownames per 
the above 
and ignore the () edge case.

The same basic code appears in drop.terms, by the way.

Terry T.


        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
  • [Rd] Error in [.terms Therneau, Terry M., Ph.D. via R-devel

Reply via email to