There is the issue of best design and the issue of dots, which I think are 
separate.

As to the dots, I don't think there is any way out but to handle it yourself. The formula parser has defined "." to mean everything in the frame that is not listed in the response. For good or ill it allows one to type y~ log(age) + . and get a model that has both log(age) and age --- perhaps that is what the user wanted.

Only you know that having strata(x) and x both on the right hand side does not 
make sense.

I have never been sympathetic to the use of ., I suppose because it never applies to my own data sets. My data always contain idenifier variables: subject id, address, enrollment date, etc, which would never be used in a fit. Use of "." simply never occurs outside of toy examples. My primary advice would be to stop worrying about it. (Or prehaps give me a context of why you do need to use it.)

Beyond that, a couple of design comments:
1. When an option refers to only a single variable, there is not need for a "~", and in fact things are easier without it. Look for example at the etastart option in glm(). I think we should use this more. If coxph were being rewritten today the cluster(id) term now used in a formula to signal grouping would instead be an
id= option.

2. I like the idea of marking variables in the formula, like strata() does in coxph. The variable is part of the prediction but plays a different role. I also now prefer setting those up so that they are not global variables, i.e. tt() makes sense only within the coxph call. It took me a long time to see exactly how to do this, you will find the example code in coxph. If redoing things today, strata() would be local as well.

3. Make the formula and call easy for the user, even if you have to do more work. This was the approach taken in coxme, which tears it apart and reassembles.

If you intend to study coxph, then you should pull up the file "sourcecode.pdf", found in the "doc" directory of the installed survival library. It has a lot more comments about my design decisions. Certainly do this if you want to emulate the custom formula processing of coxme, though for that document you'll need to grab the source code and do "make all.pdf" in its noweb directory.


Terry Therneau

On 10/16/2014 05:00 AM, r-devel-requ...@r-project.org wrote:
I am working on a new package, one in which the user needs to specify the
role that different variables play in the analysis. Where I'm stumped is the
best way to have users specify those roles.

Approach #1: Separate formula for each special component

First I thought to have users specify each formula separately, like:

new.function(formula=y~X1+X2+X3,
              weights=~w,
              observationID=~ID,
              strata=~site,
              data=mydata)

This seems to be a common approach in other packages. However, one of my
testers noted that if he put formula=y~. then w, ID, and site showed up in
the model where they weren't supposed to be. I could add some code to try to
prevent that (string matching and editing the terms object, perhaps?), but
that seemed a little clumsy to me.
...  rest of note not copied

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to