from:"Terry Therneau"

Re: [Rd] R CMD check issue with R 3.0.2

2013-10-28 Thread Terry Therneau

On 10/28/2013 06:00 AM, r-devel-requ...@r-project.org wrote:

On 13-10-26 9:49 PM, Simon Urbanek wrote:

On Oct 25, 2013, at 12:12 PM, Yihui Xie wrote:

This has been asked s many times that I think it may be a good
idea for R CMD check to just stop when the user passes a directory
instead of a tar ball to it, or automatically run R CMD build before
moving on. In my opinion, sometimes an FAQ and a bug are not entirely
different.

+1 -- and I'd do the same for R CMD INSTALL. If someone insists, there could
be --force or something like that for those that really want to work on
directories despite all the issues with that, but IMHO the default should be for
both INSTALL and check to bail out if not presented with a file -- it would save a
lot of people a lot of time spent in chasing ghost issues.

That seems like a reasonable suggestion. I wouldn't want to lose the
ability to install or check a directory; for development of packages
like rgl which have a lot of compiled code, installing from a tarball
takes a lot longer than installing when all of the code has already been
compiled.

I use R CMD check on directories often. The survival and coxme pacakges have large test
suites, and before things are packaged up for R forge there are may be multiple iterations
to get past all of them. (Add one new idea, break something old). Creating the tarball
is slow due to vignettes.
Thus I would hate to see it outlawed. Of course I know enough to ignore many of the
warnings during this testing stage, I do use the tarball for my final run, and I
understand the noise level that this option incurs on R-devel.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Multiple return values / bug in rpart?

2013-08-13 Thread Terry Therneau

I don't remember what rpartpl once did myself; as you point out it is a routine that is no 
longer used and should be removed.  I've cc'd Brian since he maintains the rpart code.


Long ago return() with multiple arguments was a legal shorthand for returning a list.  
This feature was depricated in Splus, I think even before R rose to prominence.  I vaguely 
remember a time when it's usage generated a warning.


The fact that I've never noticed this unused routine is somewhat embarrassing.  Perhaps I 
need a not documented, never called addition to R CMD check to help me along.


Terry Therneau


In the recommended package rpart (version 4.1-1), the file rpartpl.R
contains the following line:

return(x = x[!erase], y = y[!erase])

AFAIK, returning multiple values like this is not valid R. Is that
correct? I can't seem to make it work in my own code.

It doesn't appear that rpartpl.R is used anywhere, so this may have
never caused an issue. But it's tripping up my R compiler.

Thanks,
Justin Talbot


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Problem with anova and the new abbreviation restrictions

2013-07-01 Thread Terry Therneau


An unwanted side effect of the new restrictions on abrreviated names.

The anova.coxph command, in a slavish copy of anova.lm etc, returns a data frame with 
column labels of

loglik  Chisq  Df  Pr(|Chi|)

If one tries to extract the final column of the table errors result since it is not a 
standard R variable name.


 afit - anova(lm(conc ~ uptake, CO2))
 afit$P
[1] 2.905607e-06   NA
Warning message:
In `$.data.frame`(afit, P) : Name partially matched in data frame
 afit$Pr(F)
Error: unexpected '' in afit$Pr(

The second is a result of allowing TAB completion of the name.

Yes, experienced folks can sneak around it, but should this be upgraded to match and ease 
of use criteria?  Add an [.anova subscript method that allows for shortened names?


I'm still in favor of an option() that controls this new behavior with values of allow, 
warn, and error so that I can at least turn it off for me.  (I work on many data sets 
created by SAS programmers that are in love with excessively long names, and use batch 
scripts so name completion isn't avail during script creation).  But this is more local to 
me, and does not address the primary question above in a general way.


Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Calling an array in a struct in C to R

2013-06-20 Thread Terry Therneau

Another solution is the one used for a long time in the rpart code.
The R code called rpart1, which does the work, keeps a static pointer to the 
object, 
does NOT
release it's memory, and returned the size of the object.

Then the R code allocates appropriate vectors and called rpart2, which filled 
in the 
result and
released the dynamic memory.

This was written before .Call was available.  It works, but I agree with Bill 
Dunlap that 
you should use
.Call.

Terry T.


On 06/20/2013 05:00 AM, r-devel-requ...@r-project.org wrote:

Hi there,

Although I'm a quite experienced R user and know my ways in C, I stumbled
upon a problem I don't know how to solve. Therefore, I hope someone can
provide me with the information or pointers I need in order to understand
the way in which the communication between R and C occurs. I have the
following C code which basicallly reflects what I want:

typedef struct
{
float *array;
size_t used;
size_t size;
} Array;

void main2R()
{
   Array a;
   examplefunction(a);   /*fills and dynamically grows a-array*/
}

Now I would want to return a.array or a-array to R. According to the R
manuals, the compiled C code should not return anything except through its
arguments. The problem here is, I have a dynamically growing array, and I
have no idea what to pass on to C from R in order to let that work.
The word 'should' indicates that the compiled code may return something in
special circumstances, but I have no idea how to get a.array in R.

So my question is simply this: How on earth do I get the information in the
float array 'a.array' in R? Is it even possible or should I rewrite my C
code using Call in R?

Another, not preferred, options is to pre-allocate the array/vector in R on
a fixed (large-enough) size? Or do I miss something here?

Regards.

Tee-Jay-Ardie



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] loading of unwanted namespace

2013-05-03 Thread Terry Therneau


Martin,
 Your suggestion below did the trick.  The issue was obvious once this pointed 
me to the correct bit of code.
 Thanks much.

Terry T.


 begin included text ---
   trace(loadNamespace, quote(if (package == survival) recover()))

will break into ?recover when survival is being loaded.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Patch proposal for R style consistency (concerning deparse.c)

2013-05-02 Thread Terry Therneau


I'll be the anybody to argue that
 }  else {
is an ugly kludge which you will never find in my source code.  Yes, it's necessary at the 
command line because the parser needs help in guessing when an expression is finished, but 
is only needed in that case.  Since I can hardly imagine using else at the command line 
(that many correct characters in a row exceeds my typing skill) it's not an issue for me.  
I most certainly would not inflict this construction on my pupils when teaching a class, 
nor that any break of a long line has to be after + but not before, nor other crutches 
for the parser's sake.  Let them know about the special case of course, but don't 
sacrifice good coding style the deficiency.


That said, I am completely ambivalent to the result of deparse.  Just throwing up an 
objection to the purity argument: things were beginning to sound a bit too bombastic :-).


Terry T.

On 05/02/2013 05:00 AM, r-devel-requ...@r-project.org wrote:

  I want } else {.  Yihue wants } else {.  And I have not heard anybody
say they prefer the other way, unless you interpret Duncan's comment
that's nonsense as a blanket defense of the status quo. But I don't think
he meant that.  This is a matter of style consistency and avoidance of new
R-user confusion and error.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] loading of an unwanted namespace

2013-05-02 Thread Terry Therneau

I have a debugging environment for the survival package, perhaps unique to me, but I find 
it works very well.
To wit, a separate directory with copies of the source code but none of the package 
accuements of DESCRIPTION, NAMESPACE, etc. This separate space does NOT contain a copy of 
src/init.c

Within this I use R --vanilla, attach my .RData file, survival.so file, and 
away we go.

That is, until my first useage of it today under R 3.0. My runs get into trouble with 
messages about
conflicts with the survival namespace. My problem is that I can't figure out where or 
how the name space is being searched out and attached. Any hints on where to look would be 
appreciated. This magical load is

kind of spooky.

Here is a session that displays it. The setup.s file does the dyn.load and attaches some 
data sets.


tmt-local3499% R --vanilla

R version 3.0.0 (2013-04-03) -- Masked Marvel
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: i686-pc-linux-gnu (32-bit) source('setup.s')
 sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i686-pc-linux-gnu (32-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
 search()
[1] .GlobalEnv file:../.RData file:../data/.RData
[4] package:splines package:stats package:graphics
[7] package:grDevices package:utils package:datasets  coxph(Surv(time, status) ~ 
age, data=lung)

Call:
coxph(formula = Surv(time, status) ~ age, data = lung)


coef exp(coef) se(coef) z p
age 0.0187 1.02 0.0092 2.03 0.042

Likelihood ratio test=4.24 on 1 df, p=0.0395 n= 228, number of events= 165

 # That worked fine, but the next fails
 coxph(Surv(time, status) ~ pspline(age), data=lung)
Error in FUN(X[[2L]], ...) :
(converted from warning) failed to assign RegisteredNativeSymbol for Cagfit5a to Cagfit5a 
since Cagfit5a is already defined in the ‘survival’ namespace


 sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i686-pc-linux-gnu (32-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base


[10] package:methods Autoloads package:base

 options(warn=2)


R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Loss of identified points

2013-04-29 Thread Terry Therneau

Minimizing and then restoring a window that contains identified points
leads to an error on my linux box.

 sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
 sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

 plot(1:5, 1:5)
 identify(1:5, 1:5, letters[2:6])

# Identify a point or two, then minimize the graph window, then restore it
 Error: first argument must be a string (of length 1) or native symbol
reference

The graph returns  but the labels are lost.

Terry T.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Timing of SET_VECTOR_ELT

2013-04-01 Thread Terry Therneau


Assume a C program invoked by .Call, that returns a list.

 Near the top of the program we allocate space for all the list elements. (It is my habit 
to use xyz2 for the name of the R object and xyz for the pointer to its contents.)


PROTECT(means2 = allocVector(REALSXP, nvar));
means = REAL(means2);
PROTECT(u2 = allocVector(REALSXP, nvar));
u = REAL(u2);
PROTECT(loglik2 = allocVector(REALSXP, 2));
loglik = REAL(loglik2);

PROTECT(rlist = mknamed(VECSXP, outnames));

Can I assign the individual elements into rlist using SET_VECTOR_ELT at this point, or do 
I need to wait until the end of the program, after I've filled in means[i], u[i], etc.?  I 
likely depends on whether I'm assigning a pointer or a copy.



Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Timing of SET_VECTOR_ELT

2013-04-01 Thread Terry Therneau




On 04/01/2013 12:44 PM, Simon Urbanek wrote:

On Apr 1, 2013, at 1:10 PM, Terry Therneau wrote:


Assume a C program invoked by .Call, that returns a list.

Near the top of the program we allocate space for all the list elements. (It is my habit to use 
xyz2 for the name of the R object and xyz for the pointer to its contents.)

PROTECT(means2 = allocVector(REALSXP, nvar));
means = REAL(means2);
PROTECT(u2 = allocVector(REALSXP, nvar));
u = REAL(u2);
PROTECT(loglik2 = allocVector(REALSXP, 2));
loglik = REAL(loglik2);

PROTECT(rlist = mknamed(VECSXP, outnames));

Can I assign the individual elements into rlist using SET_VECTOR_ELT at this 
point, or do I need to wait until the end of the program, after I've filled in 
means[i], u[i], etc.?  I likely depends on whether I'm assigning a pointer or a 
copy.


You're assigning a pointer, so it doesn't matter.

FWIW, you can avoid all the PROTECTion mess if you alloc+assign, e.g.

SEXP rlist = PROTECT(mknamed(VECSXP, outnames));
SEXP means = SET_VECTOR_ELT(rlist, 0, allocVector(REALSXP, nvar));
...

since you only need to protect the parent object.

Cheers,
Simon
Neat trick.  I take it that SET_VECTOR_ELT returns a pointer to the object that was just 
created?  I haven't found a description of the function in any of the documents, only 
examples of its use, so this is a surprise to me.

   Lacking documentation, can I count on it in the future?

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] ess completion

2013-03-22 Thread Terry Therneau

The thread is strange to me as well, since completion is logically impossible for my 
Sweave files.

- an emacs buffer is open working on an .Rnw file
- there is no copy of R running anywhere on the machine
- the code chunk in the .Rnw file refers to a R data object saved somewhere 
else
Of course it cannot do name completion on a bit of code I'm writing, lacking omniscient 
powers.  Emacs is good but not that good :-)


The ESS manual section 12.1 states that for .R files it has completion of object names 
and file names.  I'm puzzled by exactly what this means, since it is logically impossible 
(without a running R session) to do this in general.

Linking an .Rnw file to an inferior R process doesn't make sense to me.

However, I think it's time to move this sub-thread off of R-devel.  Respond to me 
privately about the answer or the proper list for this discussion.


Terry T

On 03/22/2013 06:00 AM, r-devel-requ...@r-project.org wrote:

This thread is strange for me to read as I've been getting completion of
object names, function arguments names, and whatnot in ESS buffers for as
long as I can have been using it. And I'm only on ESS 12.09.

Perhaps you need to set `ess-use-R-completion` to non-nil. Or check the
value of `completion-at-point-functions`. Mine is '(ess-roxy-tag-completion
ess-filename-completion ess-object-completion t)

Peter


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 121, Issue 20

2013-03-21 Thread Terry Therneau

I am not in favor of the change, which is a choice of rigor over usability.

When I am developing code or functions I agree with this, and I view any 
warnings from R 
CMD check about shortened arguments as positive feedback.

But 90% of my usage of R is day to day data analysis, interactive, at the 
keyboard.  A lot 
of data sets that come to me have long variable names.  What this change will 
mean to me 
is that I'll either spend a lot of time cursing at these annoying frickin 
warnings, or 
have to take the time to make new data sets (several times a week) that have 
abbreviated 
names.  Of course when I come back to that project 8 weeks later I'll need to 
recreate the 
short-long mapping in my head...

Let's think about WHY abbreviated names were allowed in the first place.  
Usability is 
worth a lot more than purity to the actual users of a package.  S had the rare 
advantage 
of serious users just down the hall from the developers, which I hypothesise is 
one of the 
true foundation for it's success.  (What user would have invented the hiding 
aspect S4 
classes --- let's put the results of the fit into a steel box so they can't 
see the 
parts --- which is actually touted as a virtue?  I cringe every time a new 
method I need 
is sewed up this way.)

C is a remarkable language.  ... The success of C is due to a number of 
factors, none of 
them key, but all of them important. Perhaps the most significant of all is 
that C was 
developed by real practioners of programming and was designed for practical 
day-to-day 
use, not for show or for demonstration. Like any well-designed tool, it falls 
easily to 
the hand and feels good to use. Instead of providing constraints, checks and 
rigorous 
boundaries, it concentrates on providing you with power and on not getting in 
your way.
Preface to The C Book, M Banahan et al


Terry Therneau


On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote:
 Allowing partial matching on $-extraction has always been a source of 
 accidents. 
 Recently, someone who shall remain nameless tried names(mydata) - d^2 
 followed by 
 mydata$d^2. As variables in a data frame are generally considered similar to 
 variables 
 in, say, the global environment, it seems strange that foo$bar can give you 
 the content 
 of foo$bartender. In R-devel (i.e., *not* R-3.0.0 beta, but 3.1.0-to-be) 
 partial matches 
 now gives a warning. Of course, it is inevitable that lazy programmers will 
 have been 
 using code like
   

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Re Deprecating partial matching in $.data.frame

2013-03-21 Thread Terry Therneau

As a follow-up to my previous, let me make a concrete suggestion:
   Add this as one of the options
  df-partial-match = allowed, warn, fail

Set the default to warn for the current R-dev, and migrate it to fail at a 
later date of 
your choosing.

I expect that this is very little more work for the development team, it 
provides an 
extended grace period to those running old code that would depend on parial 
matching  (I 
sometimes have to recreate a 4-5 year old analysis.), it will let those who 
strongly agree 
migrate to fail immediately, and save you from the flames of those who 
disagree.

Some things that look good at first are not: examples from the past are na.fail 
as the 
default for missing values (with no global option to override) and strings as 
factors 
(with no option to override, global or otherwise).   There are other options 
that settle 
into their default and never change.  It's tough to make predictions, 
especially about 
the future-- Yogi Berra.

On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote:
 As variables in a data frame are generally considered similar to variables 
 in, say, the global environment, it seems strange that foo$bar can give you 
 the content of foo$bartender.

 In R-devel (i.e.,*not*  R-3.0.0 beta, but 3.1.0-to-be) partial matches now 
 gives a warning.

 Of course, it is inevitable that lazy programmers will have been using code 
 like

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Depreciating partial matching

2013-03-21 Thread Terry Therneau

Note: My apolgies for the Subject in the original post

On 03/21/2013 08:59 AM, Milan Bouchet-Valat wrote:

Le jeudi 21 mars 2013 à 08:51 -0500, Terry Therneau a écrit :

I am not in favor of the change, which is a choice of rigor over usability.

When I am developing code or functions I agree with this, and I view any
warnings from R
CMD check about shortened arguments as positive feedback.

But 90% of my usage of R is day to day data analysis, interactive, at the
keyboard. A lot
of data sets that come to me have long variable names. What this change will
mean to me
is that I'll either spend a lot of time cursing at these annoying frickin
warnings, or
have to take the time to make new data sets (several times a week) that have
abbreviated
names. Of course when I come back to that project 8 weeks later I'll need to
recreate the
short-long mapping in my head...

Let's think about WHY abbreviated names were allowed in the first place.
Usability is
worth a lot more than purity to the actual users of a package. S had the rare
advantage
of serious users just down the hall from the developers, which I hypothesise is
one of the
true foundation for it's success. (What user would have invented the hiding
aspect S4
classes --- let's put the results of the fit into a steel box so they can't
see the
parts --- which is actually touted as a virtue? I cringe every time a new
method I need
is sewed up this way.)

C is a remarkable language. ... The success of C is due to a number of
factors, none of
them key, but all of them important. Perhaps the most significant of all is
that C was
developed by real practioners of programming and was designed for practical
day-to-day
use, not for show or for demonstration. Like any well-designed tool, it falls
easily to
the hand and feels good to use. Instead of providing constraints, checks and
rigorous
boundaries, it concentrates on providing you with power and on not getting in your
way.
Preface to The C Book, M Banahan et al

I would think that the ability to hit the Tab key to trigger name
completion in your R GUI makes partial matching almost useless. The
avantage of interactive completion in the GUI is that you immediately
see the result of the partial matching. So you get the best of both
worlds: no need to type long variable names in full, but no traps when a
match is not what you would expect.

Doesn't this suit your use case?
Good point. This works well at the command line. However, not when interacting between
emacs and R in the way I do. For reproducability I use and emacs file that is being
corrected and massaged with chunks submitted to R; at the end I have a clean record of how
the result was obtained.

Regards

Terry Therneau

On 03/21/2013 06:00 AM, r-devel-requ...@r-project.org wrote:

Allowing partial matching on $-extraction has always been a source of accidents.
Recently, someone who shall remain nameless tried names(mydata)- d^2
followed by
mydata$d^2. As variables in a data frame are generally considered similar to
variables
in, say, the global environment, it seems strange that foo$bar can give you the
content
of foo$bartender. In R-devel (i.e., *not* R-3.0.0 beta, but 3.1.0-to-be)
partial matches
now gives a warning. Of course, it is inevitable that lazy programmers will
have been
using code like

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Depreciating partial matching

2013-03-21 Thread Terry Therneau




On 03/21/2013 10:00 AM, Simon Urbanek wrote:

I would think that the ability to hit the Tab key to trigger name
  completion in your R GUI makes partial matching almost useless. The
  avantage of interactive completion in the GUI is that you immediately
  see the result of the partial matching. So you get the best of both
  worlds: no need to type long variable names in full, but no traps when a
  match is not what you would expect.
  
  Doesn't this suit your use case?

  Good point.  This works well at the command line.  However, not when 
interacting between emacs and R in the way I do.  For reproducability I use and 
emacs file that is being corrected and massaged with chunks submitted to R; at the 
end I have a clean record of how the result was obtained.
  

If this is really true (that ESS doesn't complete in R files) then this seems 
more like a bug (or wish?) report for ESS - other editors correctly support 
code completion in R documents - after all this is a feature of R, so they 
don't need to re-invent the wheel.

Cheers,
Simon
If you are running the R process inside ESS then there is matching -- it is R.  Doing 
this, keeping a log file, and then post-hoc cleaning up all the cruft from that file is 
one way to keep documentation.  But since for my analyses the number of models/plots/etc 
that turn out to be detours or dead ends on the way to a solution is larger than the 
worthwhile part (typos alone are lots larger) I prefer to keep the file(s) as their own 
buffers and submit bits of them to an R process either by cut-paste to a separate window 
or ess-submit to an inferior process.  Emacs can't do name completion in that case.  Nor 
could it do so in an Sweave file, unless you were to keep a live R process in hand to 
pre-test chunks as you wrote them.  (One could reasonably argue that when one gets the 
Sweave stage the names should be expanded.)


To summarize: my own interactive mix of emacs/R may be unusual.  For pure interactive 
folks completion does most of the work.  I hadn't tried the newest ESS 
interactive-within-emacs till today, it's slick as well.  The number of people howling 
will be less than my original thought, though not zero.


  Still, this change could cause a lot of grief for saved R scripts.  In our group the 
code + data directory is archived whenever a medical paper is submitted (close to 
500/year), and it is very common to pull one back as is 1-4 years later for further 
exploration.  A very small subset of those are in a legal context where exact 
reproducability is paramount.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] numerics from a factor

2013-03-19 Thread Terry Therneau


Thanks Martin.  I had already changed the second argument to getOption.

By the way, per an off list comment from Brian R the bug I was addressing won't affect 
anyone using R as shipped; the default decimal separator is . whatever the region.  It 
only bit those who set the OutDec option themselves.


Terry T.


On 03/19/2013 11:30 AM, Martin Maechler wrote:

   Hi Terry, you can use type.convert instead of as.numeric
   for numbers with decimals:

   type.convert(levels(factor(1:6/2)),  dec=unlist(options(OutDec)))

   Best, Ulrike

a late and minor remark: If you use the above, a slightly more expressive
preferred way is

   type.convert(levels(factor(1:6/2)),  dec = getOption(OutDec))

Martin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] numerics from a factor

2013-03-15 Thread Terry Therneau

A problem has been pointed out by a French user of the survival package and I'm looking 
for a pointer.


 options(OutDec= ,)
 fit - survfit(Surv(1:6 /2) ~ 1)
 fit$time
[1] NA  1 NA  2 NA  3

A year or two ago some test cases that broke survfit were presented to me. The heart of 
the problem was numbers that were almost identical, where table(x) and unique(x) gave 
different counts of distinct values.
The solution was to use ftime - factor(time) at the top of the code, and do all the 
calulations using the integer levels of the factor as the unique time points.  At the very 
end the numeric component time of the result is created using 
as.numeric(levels(ftime)).  It's this last line that breaks.


I could set the OutDec option within survfit and reset when I leave using on.exit.  Any 
other simple solutions?  Any other ways I could get caught by this issue?


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 121, Issue 13

2013-03-13 Thread Terry Therneau

I think it would be a good idea.  Several versions of the survival package had a duplicate 
line in the S3methods, and were missing a line that should have been there, due to a 
cut/paste error.

   Terry T.

On 03/13/2013 06:00 AM, r-devel-requ...@r-project.org wrote:

Circa 80 CRAN and core-R packages have duplicate export entries in their 
NAMESPACE files.  E.g.,
   bit 1.1.9 : c(as.bit, as.bitwhich, as.which, physical, virtual)
   forecast 4.1 : forecast.lm
   graphics 2.15.3 : barplot
   mcmc 0.9.1 : morph
   RCurl 1.95.3 : curlOptions
   utils 2.15.3 : RweaveLatexOptions
Would it be helpful for 'check' to alert package writers to this?

I made the list using f():
   f- function ()
   {
  for(pkg in installed.packages()[,Package]) {
 try( {
 exports- parseNamespaceFile(pkg, R.home(library))$exports
 if (any(dup- duplicated(exports))) {
 cat(pkg, format(packageVersion(pkg)), :, deparse(exports[dup]), 
\n)
 }
 }, silent = TRUE)
  }
   }
I suppose it should also check for duplicates in S3method component, etc.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Registering native routines

2013-02-25 Thread Terry Therneau

That was the correct direction: I changed the earler line to routines - list(Ccoxfit5a, 
... and the the later to  .C(routnines[[1]]) and now it works as desired.


Terry T.

On 02/23/2013 03:09 AM, Duncan Murdoch wrote:

On 13-02-22 2:59 PM, Terry Therneau wrote:

I'm working on registering all the routines in the survival package, per a 
request from
R-core.  Two questions:

1. In the coxph routine I have this type of structure:
   if (survival has 2 columns) routines - c(coxfit5_a, coxfit5_b, 
coxfit5_c)
  else routines - c(agfit5_a,  agfit5_b,  
agfit5_c)

.

  .C(routines[1], arg1, etc

I tried replacing routines with a vector of native symbol references, but it doesn't 
work


Error in .C(routines[1], as.integer(n), as.integer(nvar), as.double(y),  :
first argument must be a string (of length 1) or native symbol reference


I imagine routines is a list in this case, so you should be using routines[[1]] to 
extract the element, rather than subsetting the list.


Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Registering native routines

2013-02-22 Thread Terry Therneau

I'm working on registering all the routines in the survival package, per a request from 
R-core.  Two questions:


1. In the coxph routine I have this type of structure:
 if (survival has 2 columns) routines - c(coxfit5_a, coxfit5_b, 
coxfit5_c)
else routines - c(agfit5_a,  agfit5_b,  
agfit5_c)

.

.C(routines[1], arg1, etc

I tried replacing routines with a vector of native symbol references, but it 
doesn't work

Error in .C(routines[1], as.integer(n), as.integer(nvar), as.double(y),  :
  first argument must be a string (of length 1) or native symbol reference

I had fixed up all the other .C and .Call statements first (28 of them) and that worked, 
so the problem is not with finding the set of symbol references.


2. In the R-exts manual it mentions another argument style for C calls to specify if an 
argument is for input, output, or both.  However, I can find no details on how to use it.


3. A few of my routines still had a COPY argument.  I assume that is simply 
ignored?

Terry T.

R Under development (unstable) (2013-02-11 r61902)
Platform: i686-pc-linux-gnu (32-bit)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Small suggestion for termplot

2013-02-18 Thread Terry Therneau


Brian,
  I used termplot(..., plot=FALSE) recently in R-devel: works like a charm.  Thanks much 
for the update.


  Our in-house gamterms function, which this obviates, would also return the constant 
attribute from the underlying predict(..., type=terms) call.  I have occasionally found 
this useful, and so it would be a worthwhile addition to termplot.  Currently

  fit - coxph(Surv(time, status) ~ pspline(age) + sex + ns(wt.loss), 
data=lung)
  zed - termplot(fit, se=TRUE, plot=FALSE)

returns a list with components zed$age, zed$sex, zed$wt.loss.  The constant could be added 
as another element of the list or as an attribute, I don't have an opinion either way so 
have not suggested a patch.  You may have a reason for preferring one or the other.  
Clearly this is not critical for version 3.0 release.


  I sent this to you since you impliemented the plot=FALSE change, cc to the list in case 
someone else is appropriate.


  For those on the list, the recent change has three nice features:
 a. Use of predict(.., type='terms') is a nuisance because the result is in data set 
order rather than x order, a lines() call becomes a scribble.  This has reduced each term 
to the set of unique x values, in order, just what you need for a plot.
 b. In the coxph example above I use plot(zed$age$x, exp(zed$age$y), log='y') to get 
a better y-axis label.  For all the developers, this is a nice way to deflect requests for 
yet-another-plotting-option addition to termplot.

c. Easy to overlay results from two separate fits.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau

I think your idea to remove the warnings is excellent, and a good compromise. Characters
already work fine in modeling functions except for the silly warning.

It is interesting how often the defaults for a program reflect the data sets in use at the
time the defaults were chosen. There are some such in my own survival package whose
proper value is no longer as obvious as it was when I chose them. Factors are very
handy for variables which have only a few levels and will be used in modeling. Every
character variable of every dataset in Statistical Models in S, which introduced
factors, is of this type so auto-transformation made a lot of sense. The solder data
set there is one for which Helmert contrasts are proper so guess what the default contrast
option was? (I think there are only a few data sets in the world for which Helmert makes
sense, however, and R eventually changed the default.)

For character variables that should not be factors such as a street adress
stringsAsFactors can be a real PITA, and I expect that people's preference for the option
depends almost entirely on how often these arise in their own work. As long as there is
an option that can be overridden I'm okay. Yes, I'd prefer FALSE as the default, partly
because the current value is a tripwire in the hallway that eventually catches every new user.

Terry Therneau

On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote:

Both of these were discussed by R Core. I think it's unlikely the
default for stringsAsFactors will be changed (some R Core members like
the current behaviour), but it's fairly likely the show.signif.stars
default will change. (That's if someone gets around to it: I
personally don't care about that one. P-values are commonly used
statistics, and the stars are just a simple graphical display of them.
I find some p-values to be useful, and the display to be harmless.)

I think it's really unlikely the more extreme changes (i.e. dropping
show.signif.stars completely, or dropping p-values) will happen.

Regarding stringsAsFactors: I'm not going to defend keeping it as is,
I'll let the people who like it defend it. What I will likely do is
make a few changes so that character vectors are automatically changed
to factors in modelling functions, so that operating with
stringsAsFactors=FALSE doesn't trigger silly warnings.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau

The root of this problem is that the .getXlevels function does not return the levels for 
character variables.

Future predictions depend on that information.

On 02/11/2013 11:50 AM, Duncan Murdoch wrote:

On 11/02/2013 12:13 PM, William Dunlap wrote:

Note that changing this does not just mean getting rid of silly warnings.
Currently, predict.lm() can give wrong answers when stringsAsFactors is FALSE.

 d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4, 15:17, 
28.1,28.8,30.1))

 fit_ab - lm(y ~ x + f, data = d, subset = f!=B)
   Warning message:
   In model.matrix.default(mt, mf, contrasts) :
 variable 'f' converted to a factor
 predict(fit_ab, newdata=d)
1  2  3  4  5  6  7  8  9 10
1  2  3  4 25 26 27  8  9 10
   Warning messages:
   1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
 variable 'f' converted to a factor
   2: In predict.lm(fit_ab, newdata = d) :
 prediction from a rank-deficient fit may be misleading

fit_ab is not rank-deficient and the predict should report
1 2 3 4 NA NA NA 28 29 30


In R-devel, the two warnings about factor conversions are no longer given, but the 
predictions are the same and the warning about rank deficiency still shows up.  If f is 
set to be a factor, an error is generated:


Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = 
object$xlevels) :

  factor f has new levels B

I think both the warning and error are somewhat reasonable responses.  The fit is rank 
deficient relative to the model that includes f == B,  because the column of the 
design matrix corresponding to f level B would be completely zero.  In this particular 
model, we could still do predictions for the other levels, but it also seems reasonable 
to quit, given that clearly something has gone wrong.


I do think that it's unfortunate that we don't get the same result in both cases, and 
I'd like to have gotten the predictions you suggested, but I don't think that's going to 
happen.  The reason for the difference is that the subsetting is done before the 
conversion to a factor, but I think that is unavoidable without really big changes.


Duncan Murdoch




Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
 Of Terry Therneau
 Sent: Monday, February 11, 2013 5:50 AM
 To: r-devel@r-project.org; Duncan Murdoch
 Subject: Re: [Rd] stringsAsFactors

 I think your idea to remove the warnings is excellent, and a good compromise.
 Characters
 already work fine in modeling functions except for the silly warning.

 It is interesting how often the defaults for a program reflect the data sets in use 
at the

 time the defaults were chosen.  There are some such in my own survival 
package whose
 proper value is no longer as obvious as it was when I chose them.  Factors 
are very
 handy for variables which have only a few levels and will be used in 
modeling.  Every
 character variable of every dataset in Statistical Models in S, which 
introduced
 factors, is of this type so auto-transformation made a lot of sense.  The 
solder data
 set there is one for which Helmert contrasts are proper so guess what the 
default
 contrast
 option was?  (I think there are only a few data sets in the world for which Helmert 
makes

 sense, however, and R eventually changed the default.)

 For character variables that should not be factors such as a street adress
 stringsAsFactors can be a real PITA, and I expect that people's preference for the 
option

 depends almost entirely on how often these arise in their own work.  As long 
as there is
 an option that can be overridden I'm okay.  Yes, I'd prefer FALSE as the 
default, partly
 because the current value is a tripwire in the hallway that eventually 
catches every new
 user.

 Terry Therneau

 On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote:
  Both of these were discussed by R Core.  I think it's unlikely the
  default for stringsAsFactors will be changed (some R Core members like
  the current behaviour), but it's fairly likely the show.signif.stars
  default will change.  (That's if someone gets around to it:  I
  personally don't care about that one.  P-values are commonly used
  statistics, and the stars are just a simple graphical display of them.
  I find some p-values to be useful, and the display to be harmless.)
 
  I think it's really unlikely the more extreme changes (i.e. dropping
  show.signif.stars completely, or dropping p-values) will happen.
 
  Regarding stringsAsFactors:  I'm not going to defend keeping it as is,
  I'll let the people who like it defend it.  What I will likely do is
  make a few changes so that character vectors are automatically changed
  to factors in modelling functions, so that operating with
  stringsAsFactors=FALSE doesn't trigger silly warnings.

 __
 R-devel@r-project.org

Re: [Rd] stringsAsFactors

2013-02-11 Thread Terry Therneau


Peter,
  I had an earlier response to Duncan that I should have copied to the list.

The subset issue can be fixed.  When the model changes character to factor, it needs to 
remember the levels; just like it does with the factors.  We are simply seeing a reprise 
of problems that occured whem models didn't remember factor levels -- I've been down this 
road before.  Lot's of ideas and work arounds were tried, none of which worked until that 
memory was added ($xlevels in lm, glm, coxph, etc fits).  Can everything be fixed, in the 
sense that R always makes the right choices for my data?   I seriously doubt it.


As to stringsAsFactors -- the right answer is the one that causes each person the least 
bother.  For me that is stringsAsFactors = some, which means that I turn it off and 
build the ones I need.  The right global default, likely, is whichever one that causes 
members of R Core the least bother :-)


On 02/11/2013 04:46 PM, peter dalgaard wrote:

It's logically impossible I'd say. If you want to do conversion from character 
to factor on an as-needed basis, you_will_  have issues with subsetting 
operations affecting the set of levels.

The logical way out is to define factors before subsetting. As far as possible, 
create them up front. Doing it automagically in read.table is far from 
infallible, but at least has some chance of getting in roughly right. In my 
view, this is actually a pretty strong argument for keeping 
stringsAsFactors==TRUE.

(


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] regression stars

2013-02-08 Thread Terry Therneau

 There are only a few things in R where we override the global defaults on a departmental 
level -- we really don't like to do so.  But show.signif.stars is one of the 3.


  The other 2 if you are curious: set stringsAsFactors=FALSE and make NA included by 
default in the output of table. We've been overriding both of these for 10+ years.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Why can't I unclass an array?

2012-12-21 Thread Terry Therneau

In a real example I was trying to remove the class from the result of table, just because 
it was to be used as a building block for other things and a simple integer vector seemed 
likely to be most efficient.

  I'm puzzled as to why unclass doesn't work.

 zed - table(1:5)
 class(zed)
[1] table
 class(unclass(zed))
[1] array
 class(unclass(unclass(unclass(zed
[1] array

 class(as.vector(zed))
[1] integer
 sessionInfo()
R Under development (unstable) (2012-11-28 r61176)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] CMD check and .Rbuildignore

2012-12-10 Thread Terry Therneau

I often run R CMD check on my package source.  I've noted of late that this process gives 
warning messages about

files listed in my .Rbuildignore file, where is once ignored these.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] as.matrix.Surv -- R core question/opinions

2012-12-06 Thread Terry Therneau

1. A Surv object is a matrix with some extra attributes.  The as.matrix.Surv function 
removes the extras but otherwise leaves it as is.


2. The last several versions of the survival library were accidentally missing the 
S3method('as.matrix', 'Surv') line from their NAMESPACE file.  (Instead it's position is 
held by a duplicate of the line just above it in the NAMESPACE file, suggesting a 
copy/paste error).  As a consequence the as.matrix.Surv function was effectively ignored, 
and the default method was being used.

   The as.matrix.default function leaves anything with a dim attribute alone.

3. In my current about-to-submit-to-CRAN  version of survival the missing NAMESPACE line 
was restored.  This breaks one function in one package (rms) which calls as.matrix(y) on 
a Surv object but then later looks at the type attribute of y.


 So now to the design question: should the as.matrix.Surv function sanitize the result 
by removing the extra attributes, or should it leave them alone?  The first seems cleaner; 
my accidental multi-year test of leaving them in, however, clearly shows that it does no 
harm.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Baffled with as.matrix

2012-11-30 Thread Terry Therneau


I'm puzzled by as.matrix.  It appears to work differently for Surv objects.
Here is a session from my computer:

tmt% R --vanilla
 library(survival)
Loading required package: splines
 ytest - Surv(1:3, c(1,0,1))
 is.matrix(ytest)
[1] TRUE

 attr(ytest, 'type')
[1] right
 attr(as.matrix(ytest), 'type')
[1] right

 y2 - ytest
 class(y2) - charlie
 as.matrix.charlie - survival:::as.matrix.Surv
 attr(y2, 'type')
[1] right
 attr(as.matrix(y2), 'type')
NULL

 survival:::as.matrix.Surv
function (x)
{
y - unclass(x)
attr(y, type) - NULL
y
}
bytecode: 0x91c1610
environment: namespace:survival


It appears that Surv objects are being processed by as.matrix.default, but charlie 
objects by the

actual method.  One more verification:

 attr(survival:::as.matrix.Surv(ytest), 'type')
NULL
 attr(as.matrix.default(y2), 'type')
[1] right

Context: In testing the next survival release (2.37), it has lost this special 
behavior.  One package that depends on survival expects this behavior and thus fails.  I'm 
at a loss to figure out how my package got this attribute in the first place, or how it 
lost it.  Can anyone shed light?


Terry Therneau

PS I'm on vacation for the next few days so will be intermittent with email.  (Off to see 
my first grandchild!)


-

 sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] survival_2.36-14

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Puzzling behavior while testing

2012-11-27 Thread Terry Therneau


I'm testing a new release of survival, executing the following piece of code:


for (testpkg in survdep) {
z - testInstalledPackage(testpkg, outDir=tests)
cat(testpkg, c(Ok, Failed)[z+1], \n, file=progress, append=T)
}

The vector survdep contains the names of all 156 packages listed as reverse depends on the 
CRAN page for survival.


All works well except for the packages coin and compareGroups.  If we make the survdep 
list contain only those 2 then the last line above cat... fails with testpkg not 
found.  Running ls() at that point shows a set of variables I didn't define, and no 
evidence of any of testpkg, survdep, or anything else I'd defined.  My prompt has been 
changed to R as well.


Any idea what is happening?

Terry Therneau

Here is the text just before starting the loop
 sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] tools stats graphics  grDevices utils datasets  methods
[8] base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Addition of plot=F argument to termplot

2012-10-19 Thread Terry Therneau


I have a suggested addition to termplot.
We have a local mod that is used whenever none of the termplot options is quite right.  It 
is used here almost daily for Cox models in order to put the y axis on a risk scale:



fit - coxph(Surv(time, status) ~ ph.ecog + pspline(age), data=lung)
zz - termplot(fit, se=TRUE, plot=FALSE)

yy - zz$age$y + outer(zz$age$se, c(0, -2, 2), '*')
matplot(zz$age$x, exp(yy), log='y', type='l', lty=c(1,2,2),
xlab=Age, ylab=Relative risk)


The return value is a list, each element of which is a dataframe.  Another use is to 
overlay the fits from two different models.
The changes to termplot.R and .Rd are small.  Since we're not supposed to add attachments 
to these emails, where to I send the updated files?


Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fastest non-overlapping binning mean function out there?

2012-10-03 Thread Terry Therneau


Look at rowsum.  It's pretty fast C code.

Terry T

On 10/03/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

Hi,

I'm looking for a super-duper fast mean/sum binning implementation
available in R, and before implementing z = binnedMeans(x y) in native
code myself, does any one know of an existing function/package for
this?  I'm sure it already exists.  So, given data (x,y) and B bins
bx[1]  bx[2]  ...  bx[B]  bx[B+1], I'd like to calculate the
binned means (or sums) 'z' such that z[1] = mean(x[bx[1]= x  x
bx[2]]), z[2] = mean(x[bx[2]= x  x  bx[3]]),  z[B].  Let's
assume there are no missing values and 'x' and 'bx' is already
ordered.  The length of 'x' is in the order of 10,000-millions.  The
number of elements in each bin vary.

Thanks,

Henrik


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rbuildignore question

2012-09-20 Thread Terry Therneau


I'm touching up changes to rpart and have a question with .Rbuildignore.  Here 
is my file

tmt1014% more .Rbuildignore
test.local
\.hg
src/print_tree.c


 The source code included a module print_tree.c, used for dubugging.  
Commented
out calls to can be found here and there.  I want to leave it in the source tree
even though no submitted copy of rpart will use it.

 Even with the ignore line above, R CMD check still compiles it, and gives a 
bad
boy NOTE about use of printf.  Can I/ should I/ how do I get rid of this?

(R 2.15.1)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rbuildignore question

2012-09-20 Thread Terry Therneau

Thanks, that answers my question.
1. I was running on the source tree. And by leave in the source tree I mean just
that, with local source code control managment. It's in a place I can find it when needed.

It would appear on Rforge. But per your comment, not on the CRAN source. Ok

2. I'll check the tarball soon, but I'm guessing you are right about the
error going
away.

On 09/20/2012 12:57 PM, Duncan Murdoch wrote:

On 20/09/2012 1:43 PM, Terry Therneau wrote:

I'm touching up changes to rpart and have a question with .Rbuildignore. Here
is my file

tmt1014% more .Rbuildignore
test.local
\.hg
src/print_tree.c

The source code included a module print_tree.c, used for dubugging. Commented
out calls to can be found here and there. I want to leave it in the source tree
even though no submitted copy of rpart will use it.

Even with the ignore line above, R CMD check still compiles it, and gives a bad
boy NOTE about use of printf. Can I/ should I/ how do I get rid of this?

What do you mean, leave it in the source tree? Since you're telling build to
ignore it,
I assume that's just for your own use, not for users of your package. And what
did you run
check on, the tarball or the directory? If you ran it on the tarball, then
there's
something wrong with your tarball, because it shouldn't be there (you said to
ignore it).
If you're running check on the directory, then ignore the NOTE, because it
shouldn't
appear when you run it on the tarball.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 115, Issue 18

2012-09-19 Thread Terry Therneau


In general, as a package user, I don't want people to be able to
suppress checks on CRAN.  I want things fixed.

So I am pretty sure there won't ever be a reliable CRAN-detector put
into R.  It would devalue the brand.

Duncan Murdoch


My problem is that CRAN demands that I suppress a large fraction of my checks, in order to 
fit within time constraints.  This leaves me with 3 choices.


1. Add lines to my code that tries to guess if CRAN is invoker.  A cat and mouse game per 
your desire above.


2. Remove large portions of my test suite.  I consider the survival package to be one of 
the pre-eminent current code sets in the world precisely because of the extensive 
validations, this action would change it to a second class citizen.


3. Add a magic environment variable to my local world, only do the full tests if it is 
present, and make the dumbed down version the default.  Others who want to run the full 
set are then SOL, which I very much don't like.


I agree that CRAN avoidence, other than the time constraint, should be verboten.  But I 
don't think that security through obscurity is the answer.  And note that under scenario 
3, which is essentially what is currently being forced on us, I can do such micshief as 
easily as under number 1.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 115, Issue 18

2012-09-19 Thread Terry Therneau




On 09/19/2012 09:22 AM, Duncan Murdoch wrote:

I understand the issue with time constraints on checks, and I think there are 
discussions
in progress about that.  My suggestion has been to put in place a way for a 
tester to say
that checks need to be run within a tight time limit, and CRAN as tester will 
do that in
cases where it cares about the timing.


Sounds good.


But even with the current (or past, it may have changed already) behaviour of 
tight limits
for CRAN testing, you can put in code in your package that allows for longer 
tests in
certain conditions. You'll run them, and if you advertise them, others will run 
them too.

For me this applies to certain vignettes.  Thanks to a pointer from K Hansen, I'm told I 
can handle this by putting the long one in inst/doc and the short ones in vignettes.

I'll be trying this out soon.


Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Cran package checks

2012-09-05 Thread Terry Therneau


Some questions motivated by this discussion.

From the CRAN policy page:
Checking the package should take as little CPU time as possible, as the CRAN check farm 
is a very limited resource and there are thousands of packages. Long-running tests and 
vignette code can be made optional for checking, but do ensure that the checks that are 
left do exercise all the features of the package.


 Is there a further document that elucidates more on this?
What is little CPU time.
  	Is there a documented variable that I can check and then reduce the test set for CRAN? 
Duncan mentioned one in the dicussion, but I'll end up forgetting the details.  A static 
reference would help.
	Test all the features. The test directory for survival has 75 files and still doesn't 
test everything: I'm about to add a new one today in response to a bug report.

How do I adjucate between little time and test all?

Footnote: the manual page for R CMD check (?check) has the line Many of the checks in R 
CMD check can be turned off or on by environment variables: see Chapter 6 of the ‘R 
Internals’ manual.  The reference should be chapter 7.  That chapter does document the 
use of _R_CHECK_TIMINGS_=10.  But it seems dangerous to use this as an indirect test, ie. 
if timings is 10 then this must be CRAN running.


 My biggest time offender is in the vignettes.  After multiple readings of the docs I 
still can't quite figure out how to specify

- pdf files that should be in the vignettes index, but are not .Rnw 
source
- how to tell R which ones to redo, and which to just accept the pdf
	- per the policy line all source for pdf should be available, non-inclusion of the Rnw 
file isn't an answer.

Again, is there another document I'm missing?


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] if(--as-cran)?

2012-09-04 Thread Terry Therneau




On 09/04/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

The issue is not just about CRAN vs off CRAN.
It is good to think about a more general scheme of
light testing vs normal testing vs extensive testing,
e.g.,  for the situation where the package implements
(simulation/bootstrap/ ..) based inference, and the developer
(but not only) should be able to run the extensive tests.

Martin


I agree with Martin.  A mechanism to specify testing level would be the best.
Then CRAN can choose to set that variable to 3 say, with level 1 for extensive and 2 for 
usual.
  I'm quite willing to put up with the nuisance of print() enclosures.  I prefer it to 
having yet another way to subvert the evaluation model.


  I'm a believer in testing everything possible in my packages, and wear it it as a badge 
of honor that the survival package has 4 lines of R code in the tests directory for every 
3 in the R directory.  But CRAN only needs to run a small subset of this.


Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] if(--as-cran)?

2012-09-04 Thread Terry Therneau




On 09/04/2012 01:57 PM, Duncan Murdoch wrote:

On 04/09/2012 2:36 PM, Warnes, Gregory wrote:

On 9/4/12 8:38 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:


On 04/09/2012 8:20 AM, Terry Therneau wrote:

 On 09/04/2012 05:00 AM, r-devel-requ...@r-project.org wrote:
  The issue is not just about CRAN vs off CRAN.
  It is good to think about a more general scheme of
  light testing vs normal testing vs extensive testing,
  e.g., for the situation where the package implements
  (simulation/bootstrap/ ..) based inference, and the developer
  (but not only) should be able to run the extensive tests.
 
  Martin

 I agree with Martin. A mechanism to specify testing level would be the
best.
 Then CRAN can choose to set that variable to 3 say, with level 1 for
extensive and 2 for
 usual.
 I'm quite willing to put up with the nuisance of print()
enclosures. I prefer it to
 having yet another way to subvert the evaluation model.

 I'm a believer in testing everything possible in my packages, and
wear it it as a badge
 of honor that the survival package has 4 lines of R code in the tests
directory for every
 3 in the R directory. But CRAN only needs to run a small subset of
this.

We have a mechanism to specify testing level: the --as-cran flag. We
could presumably make it more elaborate by adding other flags, or option
levels, or whatever.

What I think we shouldn't do is try to create an R-level test that says

 if (testingLevel()  3) {
 doSomething
}

because tests can be turned on and off, individually. If testingLevel 3
specified tests (A, B, C), then is our testingLevel higher if we are
running tests (A, B, D, E, F, G)? Why not just test for the presence of
whichever test is most relevant to that particular code block, e.g.

 if (D %in% tests()) {
 doSomething
}


I would prefer the testingLevel() approach of the D %in% tests()
approach, since testingLevel() provides a natural way to add successively
greater test details without having to dig into the code to determine the
set of tests.


I don't see how you could possibly calculate a single number in a reasonable 
way. What is
the number that should be returned for (A, B, D, E, F, G)?

Duncan Murdoch


Duncan is leapfrogging ahead to another level that I hadn't thought of.  An example would 
be to divide my survival package as cox, parametric, survfit, all, some of whaich 
overlap. Interesting idea, but beyond what I'd use.  When I'm focused on a detail I run 
the test of interest directly, not through CMD check.  For me low, med, high intensity 
suffices with as-cran invoking the first level and high including those that take an 
exceptionally long time.

 If you went to an A, B, C, ... approach what would be the as-cran default?

Of course there is then the furthest level, which I recently instituted for 
survival due
to the large number of dependencies: before posting a change download all the 
other
dependent packages and run their tests too.  It's an overnighter, and in that case I'd 
want level=high.  Forewarned is forearmed.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rd] Numerics behind splineDesign

2012-08-02 Thread Terry Therneau




On 08/02/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

Now I just have to grovel over the R code in ns() and bs() to figure
out how exactly they pick knots and handle boundary conditions, plus
there is some code that I don't understand in ns() that uses qr() to
postprocess the output from spline.des. I assume this is involved
somehow in imposing the boundary conditions...

Thanks again everyone for your help,
-- Nathaniel
The ns and bs function post-process the spline bases to get an 
orthagonal basis matrix, this is the use of qr.  I think this causes 
much more grief than it is worth, for the sake of a small increase in 
numeric stability.  For instance when you plot the spline bases, they 
don't look anything like the basis functions one would expect.  (Perhaps 
my background in numerical analysis was a hindrance here, since I know 
something about splines and thus have an expectation).


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Rd] Numerics behind splineDesign

2012-08-02 Thread Terry Therneau


Clearly, I got it wrong.
Thanks to others for the clearer and correct message.
Terry T

On 08/02/2012 11:59 AM, William Dunlap wrote:

If R's bs() and ns() are like S+'s (they do give very similar results* and
S+'s were written by Doug Bates), then bs() does not do any linear algebra
(like qr()) on splineDesign's output. bs() needs to come up with a default
set of knots (if the user didn't supply them), combine the Boundary.knots
(repeated ord times) and knots into one vector to pass to splineDesign,
and, if there are any x's outside of range(Boundary.knots), treat them 
specially.

ns() needs to project  splineDesign's basis functions onto the subspace
of functions whose 2nd derivatives are zero at the endpoints.  The projection
can be done with qr() and friends (S+ uses qr()).

(If you want basis functions for a periodic spline you could use a similar 
procedure
to project to the subspace of functions whose first and second derivatives
match at the upper and lower Boundary.knots.)

* The only difference between the R and S+ versions of bs() that I've noticed
is in how they deal with the x's that are outside of range(Boundary.knots).
S+ extrapolates with cubics both above and below that range while R extrapolates
with a cubic below the range and with a quadratic above the range.  I don't know
what the rationale for this is.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf
Of Terry Therneau
Sent: Thursday, August 02, 2012 6:10 AM
To: r-devel@r-project.org; Nathaniel Smith
Subject: Re: [Rd] Rd] Numerics behind splineDesign



On 08/02/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

Now I just have to grovel over the R code in ns() and bs() to figure
out how exactly they pick knots and handle boundary conditions, plus
there is some code that I don't understand in ns() that uses qr() to
postprocess the output from spline.des. I assume this is involved
somehow in imposing the boundary conditions...

Thanks again everyone for your help,
-- Nathaniel

The ns and bs function post-process the spline bases to get an
orthagonal basis matrix, this is the use of qr.  I think this causes
much more grief than it is worth, for the sake of a small increase in
numeric stability.  For instance when you plot the spline bases, they
don't look anything like the basis functions one would expect.  (Perhaps
my background in numerical analysis was a hindrance here, since I know
something about splines and thus have an expectation).

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fast Kendall's tau

2012-06-27 Thread Terry Therneau

Note that the survConcordance function, which is equivalent to Kendall's 
tau, also is O(n log n) and it does compute a variance.   The variance 
is about 4/5 of the work.

Using R 2.15.0 on an older Linux box:
  require(survival)
  require(pcaPP)
  tfun - function(n) {
+ x - 1:n + runif(n)*n
+ y - 1:n
+ t1 - system.time(cor.test(x,y, method=kendall))
+ t2 - system.time(cor.fk(x,y))
+ t3 - system.time(survConcordance(Surv(y) ~ x))
+ rbind(cor.test=t1, cor.fk=t2, survConcordance= t3)
+ }
  tfun(1e2)
 user.self sys.self elapsed user.child sys.child
cor.test0.0000   0.004  0 0
cor.fk  0.0000   0.001  0 0
survConcordance 0.0040   0.006  0 0

  tfun(1e3)
 user.self sys.self elapsed user.child sys.child
cor.test0.0240   0.026  0 0
cor.fk  0.0000   0.000  0 0
survConcordance 0.0040   0.004  0 0

  tfun(1e4)
 user.self sys.self elapsed user.child sys.child
cor.test2.2240.004   2.227  0 0
cor.fk  0.0040.000   0.003  0 0
survConcordance 0.0280.000   0.028  0 0

  tfun(5e4)
 user.self sys.self elapsed user.child sys.child
cor.test   55.5510.008  55.574  0 0
cor.fk  0.0160.000   0.018  0 0
survConcordance 0.2040.016   0.221  0 0

I agree with Brian, especially since the Spearman and Kendall results 
rarely (ever?) disagree on their main message for n50.   At the very 
most, one might add a footnote to the the help page for  cor.test 
pointing to the faster codes.

Terry T.


Brian R wrote:
 On 12-06-25 2:48 PM, Adler, Avraham wrote:
 Hello.

 Has any further action been taken regarding implementing David
 Simcha's fast Kendall tau code (now found in the package pcaPP as
 cor.fk) into R-base? It is literally hundreds of times faster,
 although I am uncertain as to whether he wrote code for testing the
 significance of the parameter. The last mention I have seen of this
 was in
 2010https://stat.ethz.ch/pipermail/r-devel/2010-February/056745.html.
 You could check the NEWS file, but I don't remember anything being done
 along these lines.  If the code is in a CRAN package, there doesn't seem
 to be any need to move it to base R.
 In addition, this is something very specialized, and the code in R is
 fast enough for all but the most unusual instances of that specialized
 task.  example(cor.fk) shows the R implementation takes well under a
 second for 2000 cases (a far higher value than is usual).


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] model.frame and predvars

2012-06-05 Thread Terry Therneau

I was looking at how the model.frame method for lm works and comparing 
it to my own for coxph.
The big difference is that I try to retain xlevels and predvars 
information for a new model frame, and lm does not.
I use a call to model.frame in predict.coxph, which is why I went that 
route, but never noted the difference till now (preparing for my course 
in Nashville).


Could someone shed light on the rationale for non-preservation?

Terry T.


Simple example

 library(survival)

 lfit - lm(time ~ factor(ph.ecog) + ns(age, 3), data=lung)
 ltemp - model.frame(lfit, data=lung[1:2,])
 ltemp
  time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3
1  306   1   -0.14285710.42857140.7142857
2  455   00.0000.0000.000

 lfit$model[1:2,]
  time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3
1  306   10.44435460.32861610.1900511
2  455   00.56972390.3618440   -0.1297479

 levels(ltemp[[2]])
[1] 0 1

 levels(lfit$model[[2]])
[1] 0 1 2 3

 cfit - coxph(Surv(time, status) ~ factor(ph.ecog) + ns(age,3), lung)
 model.frame(cfit, data= lung[1:2,])
  Surv(time, status) factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3
1   30610.44435460.32861610.1900511
2   45500.56972390.3618440   -0.1297479

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] model.frame and predvars

2012-06-05 Thread Terry Therneau




On 06/05/2012 11:32 AM, Prof Brian Ripley wrote:

On 05/06/2012 16:17, Terry Therneau wrote:

I was looking at how the model.frame method for lm works and comparing
it to my own for coxph.
The big difference is that I try to retain xlevels and predvars
information for a new model frame, and lm does not.
I use a call to model.frame in predict.coxph, which is whyI went that
route, but never noted the difference till now (preparing for my course
in Nashville).

Could someone shed light on the rationale for non-preservation?



T'other way round ... it would have needed a conscious decision to 
preserve them: these all predate xlevels and predvars.




That's true, but lots of other things have been changed, e.g. 
model.frame=T.
I was wondering if there was a positive reason not to do it; this 
answers the question.



model.matrix.lm does make use of xlevels, and I think that explains 
the difference as most lm() auxiliaries use the model matrix.


But it does not keep predvars, which is interesting.  So it fixes one 
issue but leaves another undone.



And I don't see predvars used in survival:::model.frame.coxph.


Actually it does, as is proven by the example I showed (ns 
reconstruction of the first two lines was the same).  Having commented 
source code helps to see it of course, I replace formula in the call 
with the terms object from the prior fit before invoking model.frame.  
This causes inheritance of the both specials and predvars.


So, should the survival library change to the old behavior?  I prefer my 
current one, in that the only time I would do model.frame(prior.fit, 
data=new) would be if I want the same treatment of special variables.  
If I wanted a new, de novo decision on constrasts or codings I would 
have just fit a new model.   Is there a plausible situation where the 
old behavior is prefereable?  Discounting backwards compatability of 
course, which is always a thorny issue.




Terry T.


Simple example

 library(survival)

 lfit - lm(time ~ factor(ph.ecog) + ns(age, 3), data=lung)
 ltemp - model.frame(lfit, data=lung[1:2,])
 ltemp
time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3
1 306 1 -0.1428571 0.4285714 0.7142857
2 455 0 0.000 0.000 0.000

 lfit$model[1:2,]
time factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 3).3
1 306 1 0.4443546 0.3286161 0.1900511
2 455 0 0.5697239 0.3618440 -0.1297479

 levels(ltemp[[2]])
[1] 0 1

 levels(lfit$model[[2]])
[1] 0 1 2 3

 cfit - coxph(Surv(time, status) ~ factor(ph.ecog) + ns(age,3), lung)
 model.frame(cfit, data= lung[1:2,])
Surv(time, status) factor(ph.ecog) ns(age, 3).1 ns(age, 3).2 ns(age, 
3).3

1 306 1 0.4443546 0.3286161 0.1900511
2 455 0 0.5697239 0.3618440 -0.1297479

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Evaluation without the parent frame

2012-05-17 Thread Terry Therneau


Duncan,
  I agree completely with don't use attach; if I could get all the 
users of the survival package to agree as well the problem in question 
would go away :-)  I'm thinking about ways to add more effective error 
surveillance.
  Your suggestion was not horribly complex and I'll look into it 
further.  My first foray failed because what I want (I think) is the 
environment that they had when the first  came up.  I tried baseenv 
in a spot, but then my code couldn't find the model.frame function.


Terry T.



On 05/17/2012 05:00 AM, Duncan wrote:

On 12-05-16 4:59 PM, Terry Therneau wrote:

  I've been tracking down a survival problem from R-help today.  A short
  version of the primary issue is reconstructed by the following simple
  example:

  library(survival)
  attach(lung)
  fit- coxph(Surv(time, status) ~ log(age))
  predict(fit, newdata=data.frame(abe=45))

  Note the typo in the last line of abe instead of age.  Instead of an
  error message, this returns predictions for all the subjects since
  model.frame matches age by searching more widely.   I'd prefer the error.
  I suspect this is hard -- I'd like it to not see the attached lung data
  set, but still be able to find the log function.
  Is there a not-horribly-complex solution?

The best solution is to not use attach(), use data=lung in the fit.

I think if you want to use attach but limit the search, you need
something like

predict(fit, newdata=list2env(data.frame(abe=45), parent=baseenv()))

but I don't think that meets your not horribly complex criterion.

Duncan Murdoch



  I also tried to change the primary function to lm instead of coxph.  It
  has the same problem, but does print a warning that the newdata and
  results have different lengths (which I will incorporate).

  Terry T.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Evaluation without using the parent frame

2012-05-16 Thread Terry Therneau

I've been tracking down a survival problem from R-help today.  A short 
version of the primary issue is reconstructed by the following simple 
example:


library(survival)
attach(lung)
fit - coxph(Surv(time, status) ~ log(age))
predict(fit, newdata=data.frame(abe=45))

Note the typo in the last line of abe instead of age.  Instead of an 
error message, this returns predictions for all the subjects since 
model.frame matches age by searching more widely.   I'd prefer the error.
I suspect this is hard -- I'd like it to not see the attached lung data 
set, but still be able to find the log function.

Is there a not-horribly-complex solution?

I also tried to change the primary function to lm instead of coxph.  It 
has the same problem, but does print a warning that the newdata and 
results have different lengths (which I will incorporate).


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Vignette problem

2012-05-15 Thread Terry Therneau


Duncan,
  Thanks for the ideas.  I checked them out and none seem to be the 
culprit.  My original message was wrong in one detail BTW, as I was 
already using graphicx not graphics.
  I switched over to the central server machine (CENTOS) to finish up 
the coxme submission until I could figure this out.  Ran CMD check, 
tidied up the NAMESPACE, DESCRIPTION, NEWS files, fixed a missing $ sign 
in an Rnw, etc.  Submitted the final to CRAN, then updated all the files 
on the Linux box from which the error below came -- both boxes point to 
the same Mercurial master --- and then tried again.
  Now it works!  And I have no idea what could have prompted the 
change.  It will remain a mystery, since I'm not going to try to bring 
the error back. :-)


Terry T.

On 05/14/2012 01:19 PM, Duncan Murdoch wrote:

On 14/05/2012 1:28 PM, Terry Therneau wrote:

I'm having a problem rebuilding a package, new to me in R 2.15.0
(Linux)  It hits all that contain the line
\usepackage[pdftex]{graphics}

and leads to the following when running R CMD check on the directory.
(I do this often; a final run on the tar.gz file will happen before
submission.)
Since I float and resize my figures, removing the line is fatal in other
ways.



* checking re-building of vignette PDFs ... NOTE
Error in re-building vignettes:
...
/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Package
pdftex.de
f Error: PDF mode expected, but DVI mode detected!
(pdftex.def)If you are using `latex', then call 
`pdflatex'.
(pdftex.def)Otherwise check and correct the driver 
options.

(pdftex.def)Error recovery by switching to PDF mode.

See the pdftex.def package documentation for explanation.
Type  Hreturn   for immediate help.
   ...

l.414 }\@ehc

?
/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Emergency
stop.
   ...

l.414 }\@ehc

No pages of output.
Transcript written on lmekin.log.
/usr/bin/texi2dvi: latex exited with bad status, quitting.
make: *** [lmekin.pdf] Error 1
Error in buildVignettes(dir =
/home/therneau/research/surv/Hg/coxme.Rcheck/vign_test/coxme) :
running 'make' failed
Execution halted

-

The resulting .tex file work just fine with pdflatex, however.  I
haven't found any reference to this elsewhere, but my guess is that it
is something simple that I've missed.


Do you have an explicit \usepackage{Sweave} in your file?  If not, 
Sweave will add one, and that might explain the difference between 
your two tests.


Another possibility is that you have a copy of Sweave.sty that is not 
the same as the one being used in one run or the other.  The checks 
will try to tell pdflatex to use the one that comes with your R 
version, but other local ones can sometimes have higher precedence in 
pdflatex.


And one idea what the problem might be: Sweave.sty uses the graphicx 
package, and it may conflict with the graphics package.


Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Vignette problem

2012-05-14 Thread Terry Therneau

I'm having a problem rebuilding a package, new to me in R 2.15.0 
(Linux)  It hits all that contain the line

\usepackage[pdftex]{graphics}

and leads to the following when running R CMD check on the directory.  
(I do this often; a final run on the tar.gz file will happen before 
submission.)
Since I float and resize my figures, removing the line is fatal in other 
ways.




* checking re-building of vignette PDFs ... NOTE
Error in re-building vignettes:
  ...
/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Package 
pdftex.de

f Error: PDF mode expected, but DVI mode detected!
(pdftex.def)If you are using `latex', then call `pdflatex'.
(pdftex.def)Otherwise check and correct the driver options.
(pdftex.def)Error recovery by switching to PDF mode.

See the pdftex.def package documentation for explanation.
Type  H return  for immediate help.
 ...

l.414 }\@ehc

?
/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def:414: Emergency 
stop.

 ...

l.414 }\@ehc

No pages of output.
Transcript written on lmekin.log.
/usr/bin/texi2dvi: latex exited with bad status, quitting.
make: *** [lmekin.pdf] Error 1
Error in buildVignettes(dir = 
/home/therneau/research/surv/Hg/coxme.Rcheck/vign_test/coxme) :

  running 'make' failed
Execution halted

-

The resulting .tex file work just fine with pdflatex, however.  I 
haven't found any reference to this elsewhere, but my guess is that it 
is something simple that I've missed.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Method=df for coxph in survival package

2012-04-18 Thread Terry Therneau

In that particular example the value of 4 was pulled out of the air.  
There is no particular justification.


There is a strong relationship between the effective degrees of 
freedom and the variance of the random effect, and I often find the df 
scale easier to interpret.  See the Hodges and Sargent paper in 
Biometrika (2001) for a nice explanation of the connection in linear models.


Terry T.

===  begin included message =

I've been following the example in the R help page:
http://stat.ethz.ch/R-manual/R-devel/library/survival/html/frailty.html


library(survival);
coxph(Surv(time, status) ~ age + frailty(inst, df=4), lung)


Here, in this particular example they fixed the degrees of freedom for the
random institution effects to be 4.
But, how did they decide?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Vignette questions

2012-04-12 Thread Terry Therneau


On 04/12/2012 02:15 AM, Uwe Ligges wrote:


On 12.04.2012 01:16, Paul Gilbert wrote:



On 12-04-11 04:41 PM, Terry Therneau wrote:

Context: R2.15-0 on Ubuntu.

1. I get a WARNING from CMD check for Package vignette(s) without
corresponding PDF:
In this case the vignettes directory had both the pdf and Rnw; do I 
need

to move the pdf to inst/doc?


Yes, you need to put the pdf in the inst/doc directory if it cannot be
built by R-forge and CRAN check machines, but leave the Rnw in the
vignettes directory.


No, this is all done automatically by R CMD build, hence you do not 
need to worry.


Suddenly it becomes clear:  the warning will disappear on its own when I 
apply CMD check to the tarball.


I was running it on the directory, as is my habit when I've just added a 
new feature and want to make sure I didn't break anything old, and had 
wandered down the CRAN doesn't like warnings so I better fix this 
somehow path.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Vignette questions

2012-04-11 Thread Terry Therneau


Context: R2.15-0 on Ubuntu.

1. I get a WARNING from CMD check for Package vignette(s) without 
corresponding PDF:
  In this case the vignettes directory had both the pdf and Rnw; do I 
need to move the pdf to inst/doc?


   I'm reluctant to add the pdf to the svn source on Rforge, per the 
usual rule that a code management system should not have both a primary 
source and a object dervived from it under version control.  However, if 
this is the suggested norm I could do so.


2. Close reading of the paragraph about vignette sources shows the 
following -- I think?  If I have a vignette that should not be rebuilt 
by check or BUILD I should put the .Rnw source and pdf in /inst/doc, 
and have the others that should be rebuilt in /vignettes.  This would 
include any that use private R packages, screen snapshots, ..., or in 
my case one that takes just a little short of forever to run.


3. Do these unprocessed package also contribute to the index via 
\VignetteIndexEntry lines, or will I need to create a custom index?


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] CRAN policies

2012-03-29 Thread Terry Therneau


On 03/29/2012 05:00 AM, r-devel-requ...@r-project.org wrote:

The 'No visible binding for global variable is a good example.  This
found some bugs in my 'survey' package, which I removed. There is
still one note of this type, which arises when I have to handle two
different versions of the hexbin package with different internal
structures.  The note is a false positive because the use is guarded
by an if(), but  CMD check can't tell this.   So, it's a good idea to
remove all Notes that can be removed without introducing other code
problems, which is nearly all of them, but occasionally there may be a
good reason for code that produces a Note.
The survival package has a similar special case: the routines for 
expected population survival are set up to accept multiple types of date 
format so have lines like

if (class(x) == 'chron') { y - as.numeric(x - chron(01/01/1960)}
This leaves me with two extraneous no visible binding messages.  There 
used to be half a dozen but I've tried to remove as many as possible, 
for all the good reasons already articulated by the maintainers.


It still remains that 99/100 of the no visible binding messages I've 
seen over the years were misspelled variable names, and the message is a 
very welcome check.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Vignettes and CRAN checks

2012-03-29 Thread Terry Therneau


I'l like to chime in on the subject of vignette checks.
I have one vignette in the coxme library that would be better described 
as a white paper. It discusses the adequacy of the Laplace transform 
under various scenarios.  It contains some substantial computations, so 
I'd like to mark it as never ever run this for both CRAN and my local 
builds, my next update of it will turn it into a 30+ minute 
compuatation.  Almost all users will need only the pdf;  however, my 
master file for creating it is .Rnw form and I'd like to make it 
available to those who might want to dig deeper.  I (and every other 
program I know of) had always assumed that the Laplace approx was 
excellent for a mixed effects Cox model, I'm finding out that that there 
are a few cases where this isn't so.  Luckily only a few, but this is 
important documentation.


I also have more conventional vignettes of the how to use this package 
variety, the standard CRAN check process for those is both welcome and 
appropriate.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem with table

2012-03-27 Thread Terry Therneau


On 03/27/2012 02:05 AM, Prof Brian Ripley wrote:

n 19/03/2012 17:01, Terry Therneau wrote:

R version 2.14.0, started with --vanilla

 table(c(1,2,3,4,NA), exclude=2, useNA='ifany')
1 3 4 NA
1 1 1 2

This came from a local user who wanted to remove one particular response
from some tables, but also wants to have NA always reported for data
checking purposes.
I don't think the above is what anyone would want.


You have not told us what you want!
Want: that the resulting table exclude values of 2 from the printout, 
while still reporting NA.  This is what the local user expected, the one 
who came to me with their query.


There are lots of ways to get the program to do the right thing, the 
simplest is

 table(c(1,2,3,4,NA), exclude=2) # keeping the default for useNA

You show another below.



Try

  table(as.factor(c(1,2,3,4,NA)), exclude=2, useNA='ifany')

   134 NA
   1111

Note carefully how 'exclude' is defined:

 exclude: levels to remove from all factors in ‘...’. If set to ‘NULL’,
  it implies ‘useNA=always’.

As you did not specify a factor, 'exclude' was used in forming the 
'levels'.


That is almost a legal loophole reading of the manual.  I would never 
have seen through to that level of subtlety.  A primary reason is that a 
simple test shows that exclude works on non-factors.


I'm not sure what the best course of action is.  What I've reported is a 
case where use of the options in a fairly obvious way gives an 
unexpected answer.  On the other hand, I have never  before seen or 
considered the case where someone wanted to exclude an actual data level 
from table: I myself would always have removed a column from the 
result.   If fixing this causes other problems, then perhaps we just 
give up on this rare case.


As to our local choices, we figured out a way to make display of NA the 
default without causing the above problem.   As is often the case, a 
fairly simple solution became obvious to us about 30 minutes after 
submitting a question to the list.


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] PROTECT help

2012-03-27 Thread Terry Therneau

I received the following note this AM.  The problem is, I'm not quite 
sure how to fix it.
Can one use PROTECT(coxlist(eval(PROTECT , do I create an 
intermediate variable, or otherwise?


I'm willing to update the code if someone will give me a pointer to the 
right documentation.  This particular chunk was written when there was a 
lot of change going on in the callback mechanism and so there might be a 
safer and/or simpler and/or more standard aproach by now. The routine in 
question has to do with penalized Cox models, the C code needs to get 
the value of the penalty and the penalty is an arbitrary S expression 
passed down from top level.


Terry T



In survival_2.36-12 (and earlier), in the function cox_callback() at
cox_Rcallback.c:40:

PROTECT(coxlist=eval(lang2(fexpr,data),rho));

the return value of the call to lang2() is vulnerable if allocations
within eval() give rise to garbage collection.

(Discovered during CXXR development.)

Andrew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] PROTECT help

2012-03-27 Thread Terry Therneau



Brian  Duncan:
  Thanks.  This was exactly what I needed to know.

Terry

On 03/27/2012 08:41 AM, Prof Brian Ripley wrote:

On 27/03/2012 14:22, Terry Therneau wrote:

I received the following note this AM. The problem is, I'm not quite
sure how to fix it.
Can one use PROTECT(coxlist(eval(PROTECT , do I create an
intermediate variable, or otherwise?


You can, but I find it easiest to follow if you create an intermediate 
variable.  Look for example at unique.c:


SEXP call, r;
PROTECT(call = lang2(install(as.character), s));
PROTECT(r = eval(call, env));
UNPROTECT(2);
return r;





I'm willing to update the code if someone will give me a pointer to the
right documentation. This particular chunk was written when there was a
lot of change going on in the callback mechanism and so there might be a
safer and/or simpler and/or more standard aproach by now. The routine in
question has to do with penalized Cox models, the C code needs to get
the value of the penalty and the penalty is an arbitrary S expression
passed down from top level.

Terry T



In survival_2.36-12 (and earlier), in the function cox_callback() at
cox_Rcallback.c:40:

PROTECT(coxlist=eval(lang2(fexpr,data),rho));

the return value of the call to lang2() is vulnerable if allocations
within eval() give rise to garbage collection.

(Discovered during CXXR development.)

Andrew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 109, Issue 22

2012-03-22 Thread Terry Therneau


   strongly disagree. I'm appalled to see that sentence here.
   
   Come on!
   
   The overhead is significant for any large vector and it is in 
  particular unnecessary since in .C you have to allocate*and copy*  space 
  even for results (twice!). Also it is very error-prone, because you have 
  no information about the length of vectors so it's easy to run out of 
  bounds and there is no way to check. IMHO .C should not be used for any 
  code written in this century (the only exception may be if you are 
  passing no data, e.g. if all you do is to pass a flag and expect no 
  result, you can get away with it even if it is more dangerous). It is a 
  legacy interface that dates way back and is essentially just re-named 
  .Fortran interface. Again, I would strongly recommend the use of .Call 
  in any recent code because it is safer and more efficient (if you don't 
  care about either attribute, well, feel free ;)).
   
   So aleph will not support the .C interface? ;-)
   
 It will look at the timestamp of the source file and delete the package if it 
 is not before 1980 ;). Otherwise it will send a request for punch cards with 
 .C is deprecated, please upgrade to .Call stamped out :P At that point I'll 
 be flaming about using the native Aleph interface and not the R compatibility 
 layer ;)

 Cheers,
 S
I'll dissent -- I don't think .C is inherently any more dangerous than 
.Call and prefer it's simplicity in many cases.  Calling C at all is 
what is inherently dangerous -- I can reference beyond the end of a 
vector, write over objects that should be read only, and branch to 
random places using either interface.  If you are dealing with large 
objects and worry about memory efficiency then .Call puts more tools at 
your disposal and is worth the effort.  However, I did not find the 
.Call interface at all easy to use at first and we should keep that in 
mind before getting too pompous in our lectures to the sinners of .C.  
(Mostly because the things I needed to know are scattered about in 
multiple places.)

I might have to ask for an exemption on that timestamp -- the first bits 
of the survival package only reach back to 1986.  And I've had to change 
source code systems multiple times which plays hob with the file times, 
though I did try to preserve the changelog history to forstall some 
future litigious soul who claims they wrote it first  (sccs - rcs - 
cvs - svn - mercurial).   :-)

Terry T

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 109, Issue 22

2012-03-22 Thread Terry Therneau

On 03/22/2012 09:38 AM, Simon Urbanek wrote:

 On Mar 22, 2012, at 9:45 AM, Terry Therneau thern...@mayo.edu 
 mailto:thern...@mayo.edu wrote:


   strongly disagree. I'm appalled to see that sentence here.
   
   Come on!
   
   The overhead is significant for any large vector and it is in 
  particular unnecessary since in .C you have to allocate*and copy*  
  space even for results (twice!). Also it is very error-prone, because 
  you have no information about the length of vectors so it's easy to 
  run out of bounds and there is no way to check. IMHO .C should not be 
  used for any code written in this century (the only exception may be 
  if you are passing no data, e.g. if all you do is to pass a flag and 
  expect no result, you can get away with it even if it is more 
  dangerous). It is a legacy interface that dates way back and is 
  essentially just re-named .Fortran interface. Again, I would strongly 
  recommend the use of .Call in any recent code because it is safer and 
  more efficient (if you don't care about either attribute, well, feel 
  free ;)).
   
   So aleph will not support the .C interface? ;-)
   
 It will look at the timestamp of the source file and delete the package if 
 it is not before 1980 ;). Otherwise it will send a request for punch cards 
 with .C is deprecated, please upgrade to .Call stamped out :P At that 
 point I'll be flaming about using the native Aleph interface and not the R 
 compatibility layer ;)

 Cheers,
 S
 I'll dissent -- I don't think .C is inherently any more dangerous 
 than .Call and prefer it's simplicity in many cases.  Calling C at 
 all is what is inherently dangerous -- I can reference beyond the end 
 of a vector, write over objects that should be read only, and branch 
 to random places using either interface.

 You can always do so deliberately, but with .C you have no way of 
 preventing it since you don't even know what is the length! That is 
 certainly far more dangerous than .Call where you can simply loop over 
 the length, check that the lengths are compatible etc. Also for types 
 like strings .C is a minefield that is hard to not blow up whereas 
 .Call it is even more safe than scalar arrays. You can do none of that 
 with .C which relies entirely on conventions with no recorded semantics.

I've overrun arrays in both .C and .Call routines, and I assure you that 
it was never deliberate.  Very effective at crashing R with strange 
error messages though.
I will have .C(somefun, as.integer(length(x)), x), the .Call version 
will skip the second argument and add a line in the C code; no real 
difference from my point of view.  ( Though the spelling is harder to 
remember in .Call.  Does R core use dice to decide which things are 
upper, lower, and mixed case: LENGTH, asInteger, ncols?)   R strings are 
a PITA in C and I mostly avoid them so have no arguments about C vs Call 
there.
Much of the survival library is .C for historical reasons of course, but 
I think it shows that you can be safe in C; though you can't be sloppy.

 If you are dealing with large objects and worry about memory 
 efficiency then .Call puts more tools at your disposal and is worth 
 the effort.  However, I did not find the .Call interface at all easy 
 to use at first

 I guess this depends on the developer and is certainly a factor. 
 Personally, I find the subset of the R API needed for .Call fairly 
 small and intuitive (in particular when you are just writing a safer 
 replacement for .C), but I'm obviously biased. Maybe in a separate 
 thread we could discuss this - I'd be happy to write a ref card or 
 cheat sheet if I find out what people find challenging on .Call. 
 Nonetheless, my point is that it is more than worth investing the 
 effort both in safety and performance.

I'm giving a short course at UseR on the design of the survival packages 
which promises a document containing all the details, currently being 
written.  It has examples of .Call with discussion of what each action 
is doing.  The final result will  certainly be added to the survival 
package; hopefully it will be useful enough to earn a place on the CRAN 
documentation page as well.

 and we should keep that in mind before getting too pompous in our 
 lectures to the sinners of .C.  (Mostly because the things I needed 
 to know are scattered about in multiple places.)

 I might have to ask for an exemption on that timestamp -- the first 
 bits of the survival package only reach back to 1986.  And I've had 
 to change source code systems multiple times which plays hob with the 
 file times, though I did try to preserve the changelog history to 
 forstall some future litigious soul who claims they wrote it first  
 (sccs - rcs - cvs - svn - mercurial).   :-)


 ;) Maybe the rule should be based on the date of the first appearance 
 of the package, fair enough :)

 Cheers,
 Simon

I think (but not sure) that the first version to get outside our local 
group was around 1989.
Terry

Re: [Rd] .Call ref card

2012-03-22 Thread Terry Therneau


On 03/22/2012 11:03 AM, peter dalgaard wrote:

Don't know how useful it is any more, but back in the days, I gave this talk in 
Vienna

http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf

Looking at it now, perhaps it moves a little too quickly into the hairy stuff. 
On the other hand, those were the things that I had found important to figure 
out at the time. At a quick glance, I didn't spot anything obviously outdated.


Peter,
  I just looked at this, and I'd say that moved into the hairy stuff 
way too quickly.  Much of what it covered I would never expect to use.  
Some I ddn't understand.  Part of this of course is that slides for a 
talk are rarely very useful without the talker.


 Something simpler for the layman would be good.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] .C clarification

2012-03-03 Thread Terry Therneau

Does .C duplicate unnecessary arguments?  For instance
  fit - .C(xxx, as.integer(n), x, y, z=double(15))

The first and fourth arguments would have NAMED = 0. Is my guess that .C
won't make yet one more (unnecessary) copy correct?

(Just trying to understand).

Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Sweave driver extension

2012-01-31 Thread Terry Therneau

Three thinngs -
   My original questions to R-help was who do I talk to.  That was
answered by Brian R, and the discussion of how to change Sweave moved
offline.  FYI, I have a recode in hand that allows arbitrary reordering
of chunks; but changes to code used by hundreds need to be approached
cautiously.  Like the witch says in Wizard of Oz: ... But that's not
what's worrying me, it's how to do it.  These things must be done
delicately, or you hurt the spell.

   A few emails have made me aware of others who use noweb.  Most of
them, as I have, use the original Unix utility.  But since survival is
so interwoven with R I am trying to impliment that functionality
entirely in R to make the code self contained.  Just working out how to
best do so.

  Yihui: with respect to the note below, I don't see why you want to add
new syntax.  Why add run_chunk(a) when it is a synonym for a?

Terry T.



On Mon, 2012-01-30 at 20:41 -0600, Yihui Xie wrote:
 OK, I did not realize the overhead problem is so overwhelming in your
 situation. Therefore I re-implemented the chunk reference in the knitr
 package in another way. In Sweave we use
 
 a=
 # code in chunk a
 @
 
 b=
 # use code in a
 a
 @
 
 And in knitr, we can use real R code:
 
 a=
 # code in chunk a
 @
 
 b=
 # use code in a
 run_chunk('a')
 @
 
 This also allows arbitrary levels of recursion, e.g. I add another
 chunk called 'c':
 
 c=
 run_chunk('b')
 @
 
 Because b uses a, so when c calls b, it will consequently call a as well.
 
 The function run_chunk() will not bring overhead problems, because it
 simply extracts the code from other chunks and evaluates it here. It
 is not a functional call. This feature is still in the development
 version (well, I did it this afternoon):
 https://github.com/yihui/knitr.
 
 --
 
 Talking about Knuth's original idea, I do not know as much as you, but
 under knitr's design, you can arrange code freely, since the code is
 stored in a named list after the input document is parsed. You can
 define code before using it, or use it before defining it (later); it
 is indexed by the chunk label. Top-down or bottom-up, in whatever
 order you want. And you are right; it requires a major rewrite, and
 that is exactly what I tried to do. I appreciate your feedback because
 I know you have very rich experience in reproducible research.
 
 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-2465 Web: http://yihui.name
 Department of Statistics, Iowa State University
 2215 Snedecor Hall, Ames, IA
 
 
 
 On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
 kevin.r.coom...@gmail.com wrote:
  I prefer the code chunks myself.
 
  Function calls have overhead. In a bioinformatics world with large datasets
  and an R default that uses call-by-value rather than call-by-reference, the
  function calls may have a _lot_ of overhead.  Writing the functions to make
  sure they use call-by-reference for the large objects instead has a
  different kind of overhead in the stress it puts on the writers and
  maintainers of code.
 
  But then, I'm old enough to have looked at some of Knuth's source code for
  TeX and read his book on Literate Programming, where the ideas of weave
  and tangle were created for exactly the kind of application that Terry
  asked about.  Knuth's fundamental idea here is that the documentation
  (mainly the stuff processed through weave) is created for humans, while
  the executable code (in Knuth's view, the stuff created by tangle) is
  intended for computers.  If you want people to understand the code, then you
  often want to use a top-down approach that outlines the structure -- code
  chunks with forward references work perfectly for this purpose.
 
  One of the difficulties in mapping Knuth's idea over to R and Sweave is that
  the operations of weave and tangle have gotten, well, tangled.  Sweave does
  not just prepare the documentation; it also executes the code in order to
  put the results of the computation into the documentation.  In order to get
  the forward references to work with Sweave, you would have to makes two
  passes through the file: one to make sure you know where each named chunk is
  and build a cross-reference table, and one to actually execute the code in
  the correct order.  That would presumably also require a major rewrite of
  Sweave.
 
  The solution I use is to cheat and hide the chunks initially and reveal them
  later to get the output that want. This comes down to combining eval, echo,
  keep.source, and expand in the right combinations. Something like:
 
  
  % set up a prologue that contains the code chunks. Do not evaluate or
  display them.
  coxme-check-arguments,echo=FALSE,eval=FALSE=
  # do something sensible. If multiple steps, define them above here
  # using the same idea.
  @
  % also define the other code chunks here
 
  \section{Start the First Section}
 
  The \texttt{coxme} function is defined as follows:

[Rd] Sweave driver extension

2012-01-24 Thread Terry Therneau

Almost all of the coxme package and an increasing amount of the survival
package are now written in noweb, i.e., .Rnw files.  It would be nice to
process these using the Sweave function + a special driver, which I can
do using a modified version of Sweave.  The primary change is to allow
the following type of construction

coxme
coxme - function(formula, data, subset, blah blah  ){
   coxme-check-arguments
   coxme-build
   coxme-compute
   coxme-finish
}
@

where the parts referred to come later, and will themselves be made up
of other parts.  Since the point of this file is to document source
code, the order in which chunks are defined is driven by create a
textbook thoughts and won't match the final code order for R.  
The standard noweb driver only allows one level of recursion, and no
references to things defined further down in the file.   

  The primary change to the function simply breaks the main loop into
two parts: first read through the all the lines and create a list of
code chunks (some with names), then go through the list of chunks and
call driver routines.  There are a couple of other minor details, e.g. a
precheck for infinite recursions, but no change to what is passed to the
driver routines, nor to anything but the Sweave function itself.

Primary question: who on the core team should I be holding this
conversation with?  
Secondary: Testing level?  I have a few vignettes but not many.
I'll need a noweb package anyway to contain the drivers -- should
we just duplicate the modified Sweave under another name?  
Call the package noweb, Rnoweb, ...?  

And before someone asks: Roxygen is a completely different animal and
doesn't address what I need.  I have latex equations just above the code
that impliments them, an annotated graph of the call tree next to the
section parsing a formula, etc. This is stuff that doesn't fit in
comment lines. The text/code ratio is 1.  On the other hand I've
thought very little about integration of manual pages and description
files with the code, issues which Roxygen addresses.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rd error message

2011-12-16 Thread Terry Therneau

I get the following error from one of my Rd files in R CMD check (R
2-14.0)

* checking Rd files ... WARNING
Error in switch(attr(block, Rd_tag), TEXT = if (!grepl(^[[:space:]]*
$,  : 
  EXPR must be a length 1 vector

problem found in ‘backsolve.Rd’

This is likely something that will be glaringly obvious once it's
pointed out, but without a line number I can't seem to find it. I've
been counting braces but don't see a mismatch.

FYI, the file is below. (It is modeled on chol.Rd from the Matrix
package.)

Terry Therneau
--

\name{backsolve}
\alias{backsolve-methods}
\title{Solve an Upper or Lower Triangular System}
\alias{backsolve}
\alias{backsolve,gchol-method}
\alias{backsolve,gchol.bdsmatrix-method}
\description{
  Solves a system of linear equations where the coefficient matrix is
  upper (or \sQuote{right}, \sQuote{R}) or lower (\sQuote{left},
  \sQuote{L}) triangular.\cr 

  \code{x - backsolve(R, b)} solves \eqn{R x = b}.
}
\usage{
  backsolve(r, \dots)
  \S4method{backsolve}{gchol}(r, x, k=ncol(r), upper.tri=TRUE, \dots)
  \S4method{backsolve}{gchol.bdsmatrix}(r, x, k=ncol(r), upper.tri=TRUE,
\dots)
}
\arguments{
  \item{r}{a matrix or matrix-like object}
  \item{x}{a vector or a matrix whose columns give the right-hand sides
for
the equations.}
  \item{k}{The number of columns of \code{r} and rows of \code{x} to
use.}
  \item{upper.tri}{logical; if \code{TRUE} (default), the \emph{upper}
\emph{tri}angular part of \code{r} is used.  Otherwise, the lower
one.}
  \item{\dots}{further arguments passed to other methods}
}
\section{Methods}{
  Use \code{\link{showMethods}(backsolve)} to see all the defined
methods;
the two created by the bdsmatrix library are described here:
\describe{
  \item{bdsmatrix}{\code{signature=(r= gchol)} for a generalized
cholesky decomposition}
  \item{bdsmatrix}{\code{signature=(r= gchol.bdsmatrix)} for the
generalize cholesky decomposition of a bdsmatrix object}
}
  }
\value{
  The solution of the triangular system.  The result will be a vector if
  \code{x} is a vector and a matrix if \code{x} is a matrix.

  Note that \code{forwardsolve(L, b)} is just a wrapper for
  \code{backsolve(L, b, upper.tri=FALSE)}.
}
\description{
  The generalized Cholesky decompostion of a symmetric matrix A is
  \eqn{A = LDL'}{A= LD t(L)} where D is diagonal, L is lower triangular,
  and \eqn{L'}{t(L)} is the transpose of L.
  These functions solve either \eqn{L\sqrt{D} x =b}{L sqrt(D) x=b}
  (when \code{upper.tri=FALSE}) or \eqn{\sqrt{D}L' x=b}{sqrt(D) t(L)
x=b}.
  }
\note{
  The \code{bdsmatrix} package promotes the base R \code{backsolve}
  function to a
  generic.
  To see the full documentation for the default method, view
\code{backsolve}
  from the \code{base} package.
}
\seealso{
\code{\link{forwardsolve}}, \code{\link{gchol}}
}
\keyword{ array }
\keyword{ algebra }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] last message -- I've answered my own question

2011-12-16 Thread Terry Therneau

Yes, it was glaring and obvious:  I had the label description a second
time when I really meant details.  

Still, I had to delete sections of the file 1 by 1 until it slapped me
in the face.  Sorry for any bother.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] class extension and documentation

2011-12-05 Thread Terry Therneau

I've added a backsolve method to the bdsmatrix library.
Per the Extending manual section 7.1 I've also added the following 3
lines along with my setMethod definitions for 2 classes.

backsolve - function(r, ...) UseMethod(backsolve)
backsolve.default - base:::backsolve
formals(backsolve.default) - c(formals(backsolve.default), alist(...
= )) 

I've also added a backsolve-methods.Rd page, though since my arguments
are identical to the default it doesn't say much.  And, after a test
failed, added the new backsolve.default routine to my export list.

Now R CMD check claims that I need Rd pages for backsolve and
backsolve.default.  I don't think I should rewrite those.  
   How do I sidestep this  and/or
   what other manuals should I read?

Perhaps do setMethod(backsolve, signature(r=ALL),
   base:::backsolve(r, ...))
instead?

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] PS to prior note

2011-12-05 Thread Terry Therneau

Sorry to forget this
 sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C  
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

---
 
 Further note.  I started out by just having a new setMethod, and let R
do all the behind the scenes bookkeeping as it usually does.  But then
backsolve(cmat, x) fails, where cmat is an ordinary cholesky.  It
doesn't find the default value for ncol.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] class extension and documentation

2011-12-05 Thread Terry Therneau

Duncan's reply to my query

 Now R CMD check claims that I need Rd pages for backsolve and
 backsolve.default.  I don't think I should rewrite those.
 How do I sidestep this  and/or
 what other manuals should I read?

Even though your change is subtle, I'd say it's still a change 
(backsolve is now a generic, not a simple closure; it now has a 
different argument list), so I'd like to see a new man page added.  It 
would be quite reasonable to list the new interface and then refer to 
base::backsolve for the details of what the default method does.



Fair enough.  Let's push this a little harder.

 1. Additions to the manual, section 7.1
 
  a. Warn that foo.default must now be exported.  (I don't seem to
need to export foo, exportMethods(foo) seems to be enough?)
  b. Warn that package creation will demand a manual page for foo and
foo.default.
  c. Give hints on how to do b.


 2. More information in the setMethod page on the fragility of just add 
one to make a generic.  I can't make this work if any of the arguments
have defaults.

---

Trying to impliment this idea is turning into a quagmire.  Here is my current
code:
 
backsolve - function(r, x, k=ncol(r), upper.tri=TRUE, ...) 
UseMethod(backsolve)
backsolve.default - base:::backsolve
formals(backsolve.default) - c(formals(backsolve.default), alist(... = )) 

setMethod(backsolve, signature(r=gchol, x=ANY, k=ANY, upper.tri=ANY),
  function(r, x, k=ncol(r), upper.tri=TRUE, ...) {
  if (any(diag(r)  0))
  stop(Argument has a negative diagonal, cannot backsolve)

  if (!is.numeric(x)) stop(Invalid data type for x)
  x - as.matrix(x)
...


 First, if I was going to have a new backsolve manual page, it should
mention the arguments.  This meant adding x, k, and upper.tri to the
generic above.  And they needed defaults to make the Rd page be both
correct and pass muster.

 Now, however the default method doesn't work.  backsolve of an
ordinary matrix leads to
Error in k != floor(k) : 'k' is missing
Calls: backsolve - backsolve - .local
What's the trick?

 The gchol method for backsove was documented via promptMethod. 
How do I refer to this object in a \link from backsolve?  

  Accessing that documentation is certainly a challenge.  I doublt I'll ever
find a user to guess or remember
   methods?backsolve(gchol, ANY, ANY, ANY)


Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 32 vs 64 bit difference?

2011-11-28 Thread Terry Therneau

Thank you both for the nice explanation.  I added digits=4 to my
print statements to shorten the display.  
 Mixed effects Cox models can have difficult numerical issues, as it
turns out; I've added this to my collection of things to watch for.

Terry Therneau



On Sat, 2011-11-26 at 11:37 +, Prof Brian Ripley wrote:
 On 26/11/2011 09:23, peter dalgaard wrote:
 
  On Nov 26, 2011, at 05:20 , Terry Therneau wrote:
 
  I've spent the last few hours baffled by a test suite inconsistency.
 
  The exact same library code gives slightly different answers on the home
  and work machines  - found in my R CMD check run.  I've recopied the entire
  directory to make sure it's really identical code.
The data set and fit in question has a pretty flat top to the 
  likelihood.
  I put print statements in to the f() function called by optim, and the
  two parameters and the likelihood track perfectly for 48 iterations, then
  start to drift ever so slightly:
theta= -3.254176 -6.201119 ilik= -16.64806
  theta= -3.254176 -6.201118 ilik= -16.64806
 
  And at the end of the iteration:
theta= -3.207488 -8.583329 ilik= -16.70139
  theta= -3.207488 -8.58 ilik= -16.70139
 
  As you can see, they get to the same max, but with just a slightly
  different path.
 
The work machine is running 64 bit Unix (CentOS) and the home one 32 bit
  Ubuntu.
  Could this be enough to cause the difference?  Most of my tests are
  based on all.equal, but I also print out 1 or 2 full solutions; perhaps
  I'll have to modify that?
 
  We do see quite a lot of that, yes; even running 32 and 64 bit builds on 
  the same machine, an sometimes to the extent that an algorithm diverges on 
  one architecture and diverges on the other (just peek over on R-sig-ME). 
  The comparisons by make check on R itself also give off quite a bit of 
  last decimal chatter when the architecture is switched. For some reason, 
  OSX builds seem more consistent than Windows and Linux, although I have 
  only anecdotal evidence of that.
 
  However, the basic point is that compilers don't define the sequence of FPU 
  operations down to the last detail, an internal extended-precision register 
  may or may not be used, the order of terms in a sum may be changed, etc. 
  Since 64 bit code has different performance characteristics from 32 bit 
  code (since you shift more data around for pointers), the FPU instructions 
  may be differently optimized too.
 
 However, the main difference is that all x86_64 chips have SSE2 
 registers, and so gcc makes use of them.  Not all i686 chips do, so 
 32-bit builds on Linux and Windows only use the FPU registers.
 
 This matters at ABI level: arguments get passed and values returned in 
 SSE registers: so we can't decide to only support later i686 cpus and 
 make use of SSE2 without re-compiling all the system libraries (but a 
 Linux distributor could).
 
 And the FPU registers are 80-bit and use extended precision (the way we 
 set up Windows and on every Linux system I have seen): the SSE* 
 registers are 2x64-bit.
 
 I believe that all Intel Macs are 'Core' or later and so do have SSE2, 
 although I don't know how much Apple relies on that.
 
 (The reason I know that this is the 'main difference' is that you can 
 often turn off the use of SSE2 on x86_64 and reproduce the i686 results. 
   But because of the ABI differences, you may get crashes: in R this 
 matters most often for complex numbers which are 128-bit C99 double 
 complex and passed around in an SSE register.)
 
 
  Terry Therneau
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel
 
 


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to deal with package conflicts

2011-11-25 Thread Terry Therneau

The ridge() function was put into the survival package as a simple
example of what a user could do with penalized functions.  It's not a
serious function, and I'd be open to any suggestions for change. 

Actually, for any L2 penalty + Cox model one is now better off using
coxme as the maximization process is much better thought out there.  I'd
be happy to remove ridge from survival -- except that there are bound to
be lots of folks using the function and any such changes (even good
ones) to the survival package are fraught with peril.

Duncan: this raises a larger point.  I've often wished that I could have
namespace like rules apply to formulas.  Using survival again, when I
implemented gam-like smooths I had to create pspline rather than use
the more natural s() notation.  In survival, it would be good to do
this for ridge, cluster, pspline, and frailty; all of whom depend deeply
on a coxph context.  It would also solve a frailty() problem of long
standing, that when used in survreg only a subset of the frailty options
make sense; this is documented in the help file but catches users again
and again.

Terry Therneau



On Fri, 2011-11-25 at 12:00 +0100, r-devel-requ...@r-project.org wrote:
  In my genridge package, I define a function ridge() for ridge
  regression, creating objects of class 'ridge'
  that I intend to enhance.
 
  In a documentation example, I want to use some functions from the
 car
  package. However, that package
  requires survival, which also includes a ridge() function, for coxph
  models. So, once I require(car)
  my ridge() function is masked, which means I have to use the awkward
  form below in my .Rd files.
 
  ridgemod- genridge::ridge(...)
 
  I tried to detach survival, but that doesn't work:
 
  detach(package:survival)
  Error: package ?survival? is required by ?car? so will not be
 detached
 
  I don't see any solution to this, other than
  (a) renaming my ridge() to something else -- don't want to do this
  (b) use \dontrun{} for the examples that use car
 
  Or, is there some other way?
 
 Not really.  I'd say the renaming is the preferred way to go, but you 
 might also be able to convince Terry Therneau (survival author) to
 make 
 ridge() a generic, so your method is called for your objects, and his
 is 
 called for others.
 
 Duncan Murdoch


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to deal with package conflicts

2011-11-25 Thread Terry Therneau


On Fri, 2011-11-25 at 09:50 -0500, Duncan Murdoch wrote:
 I think the general idea in formulas is that it is up to the user to 
 define the meaning of functions used in them.  Normally the user has 
 attached the package that is working on the formula, so the package 
 author can provide useful things like s(), but if a user wanted to 
 redefine s() to their own function, that should be possible.
 Formulas 
 do have environments attached, so both variables and functions should
 be 
 looked up there.
 

I don't agree that this is the best way.  A function like coxph could
easily have in its documentation a list of the formula specials that
it defines internally.  If the user want something of their own they can
easily use a different word.  In fact, I would strongly recommend that
they don't use one of these key names.  For things that work across
mutiple packages like ns(), what user in his right mind would redefine
it?
  So I re-raise the question.  Is there a reasonably simple way to make
the survival ridge() function specific to survival formulas?  It sets up
structures that have no meaning anywhere else, and its global definition
stands in the way of other sensible uses.  Having it be not exported +
obey namespace type sematics would be a plus all around.   

Philosophical aside:
  I have discovered to my dismay that formulas do have environments
attached, and that variables/functions are looked up there.  This made
sensible semantics for predict() within a function impossible for some
of the survival functions, unless I were to change all the routines to a
model=TRUE default.  (And a change of that magnitude to survival, with
its long list of dependencies, is not fun to contemplate.  A very quick
survey reveals several dependent packages will break.) So I don't agree
nearly so fully with the should part of your last sentence.  The out
of context evaluations allowed by environments are, I find, always
tricky and often lead to intricate special cases. 
  Thus, moving back and forth between how it seems that a formula should
work, and how it actually does work, sometimes leaves my head
spinning.  

Terry T.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to deal with package conflicts

2011-11-25 Thread Terry Therneau


On Fri, 2011-11-25 at 10:42 -0500, Michael Friendly wrote:
 Duncan provided one suggestion:  make ridge() an S3 generic, and
 rename ridge()
 to ridge.coxph(), but this won't work, since you use ridge() inside 
 coxph() and survreg() to add a penalty term in the model formula.
 Another idea might be simply to not export ridge(), but I have the 
 feeling this will break your R CMD checks.
 

The S3 generic idea won't work.  The argument inside ridge(x) is an
ordinary variable, and it's the argument inside that a generic uses for
dispatch.  I want to dispatch based on the context, which is what the
namespace mechanism does for a call to for instance coxpenal.fit, a non
exported survival function.  
  
I suspect that not exporting ridge would work for
coxph(Surv(time, status) ~ ph.ecog + ridge(age), data=lung)
but not for
  myform -Surv(time, status) ~ ph.ecog + ridge(age)
  coxph(myform, data=lung)

(I haven't test this)  This is because formulas are treated rather like
functions, with bindings coming into play when they are first defined,
not when they are first used. 

 Alternatively, my particular problem (wanting to use car::vif in my 
 package documentation) would
 be solved if John Fox considered making making survival a Suggests: 
 package rather than a
 Depends: one.  This might work, since survival is only referenced in
 car 
 by providing Anova()
 methods for coxph models.
 
 I think all of this raises a general issue of unintended consequences
 of 
 package bloat, where
 (a) Depends: packages are forced to load by require()/library(),
 whether 
 they are really needed or not;
 (b) There is nothing like require(car, depends=FALSE) to circumvent
 this;
 (c) Once a require()'d package is loaded, it cannot be unloaded;
 (d) AFAIK, there is no way for a package author to override the
 masking 
 of functions or data
 provided by other other packages, except by using mypackage::myfun()
 calls.
 
 To me this seems to be a flaw in the namespace mechanism.
 
 
 I will say that the long list of reverse depends on the survival
package does give me pause when making changes.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to deal with package conflicts

2011-11-25 Thread Terry Therneau

 I like the idea of making the functions local, and will persue it.
This issue has bothered me for a long time -- I had real misgivings when
I introduced cluster to the package, but did not at that time see any
way other than making it global.  
 I might make this change soon in the ridge function, since it's a good
test case -- less likely to cause downstream troubles.

Here is another possible approach:
 Inside coxph, just before calling eval with the formula, create a new
environment tempenv which consists of my handful of special functions
(ridge, frailty, cluster, pspline) who have meaning only inside a coxph
call, with a parent environment of the tempenv being the current
environment of the formula. Then set the environment of the formula to
tempenv, then eval.  Would this work?

 Two small further questions:
1. Any special rules for the documentation?  We need a page for
cluster, but want to mark it almost like a method in the sense of
applying only in a one context.

2. Does one scheme or another work best for downstream functions like
predict or model.matrix?  Duncan's idea of direct modification might
have an advantage (?) in that the terms object would be permanently
changed.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] 32 vs 64 bit difference?

2011-11-25 Thread Terry Therneau

I've spent the last few hours baffled by a test suite inconsistency.

The exact same library code gives slightly different answers on the home 
and work machines  - found in my R CMD check run.  I've recopied the entire
directory to make sure it's really identical code. 
  The data set and fit in question has a pretty flat top to the likelihood.
I put print statements in to the f() function called by optim, and the
two parameters and the likelihood track perfectly for 48 iterations, then
start to drift ever so slightly:
 theta= -3.254176 -6.201119 ilik= -16.64806 
 theta= -3.254176 -6.201118 ilik= -16.64806 

And at the end of the iteration:
 theta= -3.207488 -8.583329 ilik= -16.70139 
 theta= -3.207488 -8.58 ilik= -16.70139 

As you can see, they get to the same max, but with just a slightly
different path.

  The work machine is running 64 bit Unix (CentOS) and the home one 32 bit
Ubuntu.
Could this be enough to cause the difference?  Most of my tests are
based on all.equal, but I also print out 1 or 2 full solutions; perhaps 
I'll have to modify that?

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 105, Issue 22

2011-11-22 Thread Terry Therneau


On Tue, 2011-11-22 at 12:00 +0100, r-devel-requ...@r-project.org wrote:
 How are the package authors supposed to develop their own NAMESPACEd
 packages efficiently? And what are the directions R is taking in order
 to
 facilitate the development cycle?
 

This is my strategy.
I have a separate directory test.local in my tree, not exported to
Rforge.  It has a Makefile which loads all the sources from ../R, copies
the C files from ../src and makes a local loadable S.so.  I then do all
my development there, without a name space.  I can overwrite functions,
trace, add browser() calls --- all the things you want to do --- with
standard semantics. I obviously don't load my package.
  
I think this is the simplest route.  Namespaces are a great idea, but
during testing I don't want all those protections.  It took me a half
hour to set up the Makefile; I've recouped that effort many times over.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Small nit in Sweave

2011-11-15 Thread Terry Therneau

Two small Sweave issues.

1. I had the following line in my code
   echo=FALSE, results=hide

resulting in the message
Error in match.arg(options$results, c(verbatim, tex, hide)) : 
  'arg' should be one of “verbatim”, “tex”, “hide”

  I puzzled on this a bit since my argument exactly matched the message,
until I thought of trying it without the quotes.

2. Where should I have reported this?  Sweave is not on CRAN so I
couldn't find the maintainer email there.  Perhaps I'm missing something
obvious though

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] multiple defines of diag

2011-10-06 Thread Terry Therneau

The current coxme code has functions that depend on bdsmatrix and others
that depend on Matrix, both those pacakges define S4 methods for diag.
When loaded, the message appears:
   replacing previous import ‘diag’ when loading ‘Matrix’ 

Questions:
  1. Do I need to worry about this?  If so, what can I do about it?
I suppose I could add an importFrom directive, but it will be a pain
unless there is an allbut(diag) option I'm not aware of.  (I assume
that methods and classes can be listed in importFrom, i.e., there are no
importMethodsFrom or importClassesFrom functions).

  2. If I don't need to worry, is there a way to turn the message off?
I as a developer need to see it, but users don't and it may worry them
unnecessarily.  Updating all 17 of my test/*.Rout.save files is a
nuisance as well, but only a nuisance.

I'd like to upload this to CRAN soon as I have users asking for the
updated lmekin function (which uses Matrix).  In the long term all the
bdsmatrix functions will be replaced by Matrix, but that requires major
changes to C code so long is the operative word. 

Thanks,
  Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] multiple defines of diag

2011-10-06 Thread Terry Therneau


On Thu, 2011-10-06 at 10:00 -0400, Kasper Daniel Hansen wrote:
 if you're using two packages that both define a diag function/method
 you absolutely _have_ to resolve this using your NAMESPACE.  [Update:
 I see both are methods.  I actually don't know what happens when you
 have the same generic in both packages]
 

Your response made me look further, with some surprising results.

1. Sequential loading
tmt226% R --vanilla
R version 2.13.0 (2011-04-13)

 library(bdsmatrix)
 tmat - bdsmatrix(c(3,2,2,4),
c(22,1,2,21,3,20,19,4,18,17,5,16,15,6,7, 8,14,9,10,13,11,12),
matrix(c(1,0,1,1,0,0,1,1,0,1,0,10,0, 
   0,1,1,0,1,1,0,1,1,0,1,0,10), ncol=2))
 tmat[1:7,1:7] [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   22120000
[2,]1   2130000
[3,]23   200000
[4,]000   19400
[5,]0004   1800
[6,]00000   175 

 diag(tmat)
 [1] 22 21 20 19 18 17 16 15 14 13 12 10 10

 library(Matrix)
Loading required package: lattice

Attaching package: 'Matrix'

The following object(s) are masked from 'package:base':

det

 diag(tmat)
 [1] 22 21 20 19 18 17 16 15 14 13 12 10 10

  Things to note: 
 in this case I did not get a message about overwriting the diag
method, 
 it works.  

This was not my experience with ranef(), an S3 generic that coxme, nlme,
and lme4 all define; there whichever library loaded last did not
discover existing methods.  That is, if one loaded nlme after coxme,
then ranef(a coxme object) would not dispatch ranef.coxme.  Our solution
(Doug Bates and I) was to have both coxme and lme4 import ranef and
fixef from the nlme library.
  However, per above it appears to work with S4 generics.
  Can I count on it though?

-

Case 2:

tmt229% R --vanilla
R version 2.13.0 (2011-04-13)

 library(coxme)
Loading required package: survival
Loading required package: splines
Loading required package: bdsmatrix
Loading required package: nlme
Loading required package: Matrix
Loading required package: lattice

Attaching package: 'Matrix'

The following object(s) are masked from 'package:base':

det

Warning message:
replacing previous import ‘diag’ when loading ‘Matrix’ 

 tmat - bdsmatrix(c(3,2,2,4), 
  c(22,1,2,21,3,20,19,4,18,17,5,16,15,6,7,
8,14,9,10,13,11,12),
  matrix(c(1,0,1,1,0,0,1,1,0,1,0,10,0,
   0,1,1,0,1,1,0,1,1,0,1,0,10), ncol=2))
diag(tmat)
 [1] 22 21 20 19 18 17 16 15 14 13 12 10 10


 Things to note:
 I now get a warning message about diag. Why only here?
 Per the earlier comment I'm not importing all of nlme, just the two
generics
 It still works

(This example isn't reproducable for others: the coxme library on CRAN
does not yet have a Matrix dependency.)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] number of copies - summary

2011-10-04 Thread Terry Therneau

My thanks to Bill Dunlap and Simon Urbanek for clarifying many of the
details.  This gives me what I need to go forward.  

  Yes, I will likely convert more and more things to .Call over time.
This clearly gives the most control over excess memory copies. I am
getting more comments from people using survival on huge data sets so
memory usage is an issue I'll be spending more thought on.

  I'm not nearly as negative about .C as Simon.  Part of this is long
experience with C standalone code: one just gets used to the idea that
mismatched args to a subroutine are deadly. A test of all args to .C
(via insertion of a browser call) is part of initial code development.
Another is that constructing the return argument from .Call (a list with
names) is a bit of a pain.  So I will sometimes use dup=F. However, the
opion of R core about .C is clear, so it behooves me to move away from
it.

Thanks again for the useful comments.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] number of copies

2011-10-03 Thread Terry Therneau

 I'm looking at memory efficiency for some of the survival code.  The
following fragment appears in coxph.fit
coxfit - .C(coxfit2, iter=as.integer(maxiter),
   as.integer(n),
   as.integer(nvar), stime,
   sstat,
   x= x[sorted,] ,
  ...

Does this make a second copy of x to pass to the routine (my
expectation) or will I end up with 3: x and x[sorted,] in the local
frame of reference, and another due to dup=TRUE?

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] number of copies

2011-10-03 Thread Terry Therneau


On Mon, 2011-10-03 at 12:31 -0400, Simon Urbanek wrote:
  Thanks.  I was hoping that x[,sorted] would act like double(n)
 does in a .C call, and have no extra copies made since it has no local
 assignment.
 
 Yes it does act the same way, you get an extra copy with double(n) as
 well - there is no difference.
 

That is surprising.  This is not true of Splus.  Since Chambers mentions
it as a specific case as well (Programming with Data, p421) I assumed
that R would be at least as intellegent.  He also used the unset()
function to declare that something could be treated like double(n),
i.e., need no further copies. But that always looked like a dangerous
assertion to me and unset has disappeared, perhaps deservedly, from R.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Confusing inheritance problem

2011-07-18 Thread Terry Therneau

 I've packaged the test library up as a tar file at
ftp.mayo.edu
directory therneau, file ktest.tar
login username: mayoftp
 password:  KPlLiFoz
This will disappear in 3 days (Mayo is very fussy about outside access).

In response to Uwe's comments
  1. 2.13.0 is not recent
It's not the latest, but it is recent.  This is for machines at work where 
where upgrades happen more infrequently
  2. Matrix not loaded  
The sessionInfo was only to show what version we have.  Forgetting to load 
Matrix 
isn't the problem -- when I do that the error is quick and obvious.

 Thanks in advance for any pointers.

Terry T.



On Sat, 2011-07-16 at 19:27 +0200, Uwe Ligges wrote:
 
 On 15.07.2011 23:23, Terry Therneau wrote:
I have library in development with a function that works when called
  from the top level, but fails under R CMD check.  The paricular line of
  failure is
  rsum- rowSums(kmat0)
  where kmat is a dsCMatrix object.
 
 I'm currently stumped and looking for some ideas.
 
 I've created a stripped down library ktest that has only 3
  functions: pedigree.R to create a pedigree or pedigreeList object,
 kinship.R with kinship methods for the two objects
 one small compute function called by the others
  along with the minimal amount of other information such that a call to
  R --vanilla CMD check ktest
  gives no errors until the fatal one.
 
There are two test cases.  A 3 line one that creates a dsCMatrix and
  call rowSums at the top level works fine, but the same call inside the
  kmat.pedigreeList function gives an error
   'x' must be an array of at least two dimensions
  Adding a print statement above the rowSums call shows that the argument
  is a 14 by 14 dsCMatrix.
 
I'm happy to send the library to anyone else to try and duplicate.
   Terry Therneau
 
  tmt% R --vanilla
 
  sessionInfo()
  R version 2.13.0 (2011-04-13)
  Platform: x86_64-unknown-linux-gnu (64-bit)
 
  locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=C
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods
  base
 
 
 
 Terry,
 
 1. Your R is not recent.
 2. You do this without having Matrix loaded (according to 
 sessionInfo())? This may already be the cause of your problems.
 3. You may want to make your package available on some website. I am 
 sure there are people who will take a look (including me, but not today).
 
 Best wishes,
 Uwe
 
 
 
 
 
 
 
 
 
  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Confusing inheritance problem

2011-07-15 Thread Terry Therneau

 I have library in development with a function that works when called
from the top level, but fails under R CMD check.  The paricular line of
failure is
rsum - rowSums(kmat0)
where kmat is a dsCMatrix object.

  I'm currently stumped and looking for some ideas.

  I've created a stripped down library ktest that has only 3
functions: pedigree.R to create a pedigree or pedigreeList object, 
   kinship.R with kinship methods for the two objects 
   one small compute function called by the others
along with the minimal amount of other information such that a call to
   R --vanilla CMD check ktest
gives no errors until the fatal one.

 There are two test cases.  A 3 line one that creates a dsCMatrix and
call rowSums at the top level works fine, but the same call inside the
kmat.pedigreeList function gives an error
'x' must be an array of at least two dimensions
Adding a print statement above the rowSums call shows that the argument
is a 14 by 14 dsCMatrix.

 I'm happy to send the library to anyone else to try and duplicate.
Terry Therneau

tmt% R --vanilla

 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C  
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R 2.13 segfault with range()

2011-05-02 Thread Terry Therneau

Running on a shared CENTOS server

tmt711% R --vanilla

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C  
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 
 load('test.rda')
 y - matrix(ifelse(tdata$dataset=0, NA, tdata$dataset), 
+ ncol=ncol(tdata$dataset))
 dim(y)
[1] 2228335
 range(y)

 *** caught segfault ***
address 0x2b490421f000, cause 'memory not mapped'

Traceback:
 1: range(y)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 3

tmt712% ls -s test.rda
2664 test.rda

-
  
  The data set is too large to attach, but I can send the test.rda file
off list.  The data is not confidential.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] standard format for newdata objects

2011-04-27 Thread Terry Therneau


On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:
 Er... No, I don't think Paul is being particularly rude here (and he
 has been doing us some substantial favors in the past, notably his
 useful Rtips page). I know the kind of functionality he is looking
 for; e.g., SAS JMP has some rather nice interactive displays of
 regression effects for which you'll need to fill in something for
 the other variables. 
 
 However, that being said, I agree with Duncan that we probably do not
 want to canonicalize any particular method of filling in average
 values for data frame variables. Whatever you do will be statistically
 dubious (in particular, using the mode of a factor variable gives me
 the creeps: Do a subgroup analysis and your average person switches
 from male to female?), so I think it is one of those cases where it is
 best to provide mechanism, not policy.
 

  I agree with Peter.  There are two tasks in newdata: deciding what the
default reference levels should be, and building the data frame with
those levels.  It's the first part that is hard. For survival curves
from a Cox model the historical default has been to use the mean of each
covariate, which can be awful (sex coded as 0/1 leads to prediction for
a hermaphrodite?).  Nevertheless, I've not been able to think of a
strategy that would give sensible answers for most of the data I use and
coxph retains the flawed default for lack of a better idea.  When
teaching a class on this, I tell listeners bite the bullet and build
the newdata that makes clinical sense, because package defaults are
always unwise for some of the variables.  How can a package possibly
know that it should use bilirubin=1.0 (upper limit of normal) and AST =
45 when the data set is one of my liver transplant studies?
   Frank Harrell would argue that his sometimes misguided default in
cph is better than the almost always wrong one in coxph though, and
there is certainly some strength in that position.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] help with eval()

2011-04-18 Thread Terry Therneau

I've narrowed my scope problems with predict.coxph further.
Here is a condensed example:

fcall3 - as.formula(time ~ age)
dfun3 - function(dcall) {
fit - lm(dcall, data=lung, model=FALSE)
model.frame(fit)
}
dfun3(fcall3)

The final call fails: it can't find 'dcall'.

The relevant code in model.frame.lm is:
   env - environment(formula$terms)
   if (is.null(env)) 
env - parent.frame()
eval(fcall, env, parent.frame())

If the environment of the formula is .Globalenv, as it is here, the
contents of parent.frame() are ignored.  Adding a
   print(ls(parent.frame())) 
statement just above the  final call shows that it isn't a scope issue:
the variables we want are there.

  I don't understand the logic behind looking for variables in the place
the formula was first typed (this is not a complaint).  The inability to
look elsewhere however has stymied my efforts to fix the scoping problem
in predict.coxph, unless I drop the env(formula) argument alltogether.
But I assume there must be good reasons for it's inclusion and am
reluctant to do so.

Terry Therneau

 sessionInfo()
R version 2.13.0 RC (2011-04-12 r55424)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C  
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base 

PS. This also fails
dfun3 - function(dcall) {
fit - lm(dcall, data=lung)
model.frame(fit, subset=1:10)
}
You just need to force model.frame.lm to recreate data.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] predict()

2011-04-15 Thread Terry Therneau


On Fri, 2011-04-15 at 09:10 +0200, peter dalgaard wrote:
 I couldn't reproduce it from Terry's description either, but there
 _is_ an issue which parallels the 

I'll start with an apology: as a first guess to understand the problem
with predict.coxph I tried something parallel to Ivo's example using lm,
during a 5 minute slice of time between meetings.  I got a similar error
message, posted a query based on that, and ran off.  The real root of my
error message, however, was a simple typographical mistake.  My
apologies for the premature note, whose error was gently pointed out by
all three of you.  I should have waited till I had more time.

 Now for the real problem, which is an oversight in my design.  When
updating the predict.coxph, residuals.coxph, survfit.coxph and
survexp.coxph functions to more modern processing of the newdata
argument (proper handling of factors, etc), I made a decision to have
all of them call model.frame.coxph and model.matrix.coxph.  Model.matrix
in particular has some special steps, and it would be better to have
only one point of modification.
  However, this introduces one more level of indirection
predict - model.frame.coxph - model.frame
and the parent.frame() reference in model.frame.coxph now points to
predict when we actually want it to point to the parent of predict.  In
predict.lm the parent.frame() argument lives in the predict.lm code.

 I see three paths to correction:
1. Make model.frame.coxph smarter.  Peter's example seems to show that
this is hard.

2. Change the line in predict.coxph (and the others)
mf - model.frame(object)
to some form of eval() that causes the parent.frame() operation to reach
back properly.  I'm not yet sure what variant of eval or do.call or ...
this is; the effect I'm looking for is similar to what NextMethod()
does.

3. Change my call to
mf - model.frame(object$terms, ...  mimic the call in lm
This may be the simplest, since model.frame.coxph does not do anything
special.

Last, I need to check whether any of the above issues affect
model.matrix calls.

Any comments on which is the better path would be welcomed.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Problem with dyn.load in R 2.13.0

2011-04-13 Thread Terry Therneau

I have a test directory for the survival suite, and dyn.load has ceased
to work in it.  Below shows the log:

tmt1075% R --vanilla

R version 2.12.2 (2011-02-25)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 dyn.load('survival.so')
 q()

tmt1076% R13 --vanilla

R version 2.13.0 RC (2011-04-11 r55409)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 dyn.load('survival.so')
Error in dyn.load(survival.so) : 
  unable to load shared object
'/people/biostat2/therneau/research/surv/Rtest/survival.so':
  libR.so: cannot open shared object file: No such file or directory
 q()

--

 Is the issue that the .so file must have been created with the R2.13
script?  That's not what the error message says, however.  It almost
looks like it is ignoring my first argument and looking instead for
libR.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem with dyn.load in R 2.13.0

2011-04-13 Thread Terry Therneau


On Wed, 2011-04-13 at 16:45 -0400, Simon Urbanek wrote:
 We have no details, but my wild guess would be that you did not
 re-build the package for 2.13.0 and you have static libR in 2.13.0 yet
 dynamic in 2.12.2.
 
 Cheers,
 Simon
 

Per my prior note, my guess at the root of the issue is use of user
libraries, which is common here.

Here are futher details if that helps.

 R2.13 was downloaded and built this AM in my ~/temp directory, using
the standard 
  ./configure
  make
Then a copy of the shell script was copied to in ~therneau/bin/R13 for
my convenience.  I paid very little attention to the output of
configure.

This is running on a shared server using CentOS release 5.5 (98 users at
the moment).
R2.12.2 is centrally available.

 sessionInfo()
R version 2.13.0 RC (2011-04-11 r55409)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=C  
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base   

I can't find a libR.so anywhere -- perhaps it's static.

tmt1119% pwd
/people/biostat2/therneau/temp/R-rc
tmt1120% find . -name 'libR*'
./src/extra/blas/libRblas.so
./src/main/libR.a
./src/modules/lapack/libRlapack.so
./src/nmath/standalone/libRmath.pc.in
./src/unix/libR.pc.in
./lib/libRblas.so
./lib/libRlapack.so


Thanks for everyone's help.

Terry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] drop.unused.levels

2011-03-31 Thread Terry Therneau

 A user sent me a query on survreg, which when boiled down is a request
to drop unused levels when setting up the X matrix.
 I don't have a strong opinion one way or the other, but am loath to
make the change: I expect that code somewhere will break, perhaps a lot,
when the length of the coefficients changes.

 On the other hand, at some point this change was made to lm().  Does
the R core have an opinion on which is better, or a pointer to
discussion that underlied the change to lm()?  It could be enough to
push me over the edge.  It would affect survreg, coxph, survfit,
survdiff, coxme, ...

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] rowsum

2011-03-29 Thread Terry Therneau

 with the entirely different rowSums, but it has been around
 for a long time.)
A lot longer than rowSums ...
 Bill Dunlap
 Spotfire, TIBCO Software
---
  This made me smile.  The rowsums function was originally an internal
part of the survival package, used for fast computation of certain sums
when there is a cluster() statement.  It was Statistical Sciences
(S-Plus) who moved it to global status.  That is, they used it in enough
other places that they decided to speed it up, took over the
maintainance and ownership of the function (with my blessing), and
ceased to label it as part of survival in the manual.  
  This metabug can't be laid at R's feet. 
Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] round, unique and factor

2011-03-21 Thread Terry Therneau

 Survfit had a bug in some prior releases due to the use of both
unique(times) and table(times); I fixed it by rounding to 15 digits per
the manual page for as.character.  Yes, I should ferret out all the
usages instead, but this was fast and it cured the user's problem.
  The bug is back!  A data set from a local colleage triggers it.  
I can send the rda file to anyone who wishes.  

 The current code has
 digits - floor((.Machine$double.digits) * 
logb(.Machine$double.base,10)) #base 10 digits
 Y[,1] - signif(Y[,1], digits)

which gives 15 digits; should I subtract one more?
  
Should the documentation change?

In the meantime I'm looking at the more permanent fix of turning time
into a factor, then back at the very end.  Because it is a bigger change
the potential for breakage is higer, however.

Terry T.

tmt45% R --vanilla

R version 2.12.2 (2011-02-25)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

 load('test.rda')
 ls()
[1] temp2
 temp2 - round(temp2, 15)
 length(unique(temp2))
[1] 954
 length(table(temp2))
[1] 942
 .Machine$double.eps
[1] 2.220446e-16
 range(temp2)
[1]  0. 26.0397

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] coxph and drop1

2011-03-14 Thread Terry Therneau

A recent question in r-help made me realize that I should add a drop1 method 
for coxph and survreg.  The default does not handle strata() or cluster()
properly.  
  However, for coxph the right options for the test argument would be 
likelihood-ratio, score, and Wald; not chisq and F.  All of them reference
a chi-square distribution.  My thought is use these arguments, and add an
error message read the help file for drop1.coxph when the defaults appear.

  Any better suggestions?

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] coxph and drop1

2011-03-14 Thread Terry Therneau


On Mon, 2011-03-14 at 12:52 -0400, John Fox wrote:
 Dear Terry,
 
 Possibly I'm missing something, but since the generic drop1() doesn't have a 
 test argument, why is there a problem?
 
  args(drop1)
 function (object, scope, ...) 
 
 If you use match.arg() against test, then the error message should be 
 informative if one of the prescribed values isn't supplied.
 
 Best,
  John
 
 
 John Fox

I stand corrected.  I was looking at the help page and focused my eyes
on the default and lm versions, both of which have the test argument
with options none and chisq.  I should have looked higher on the page.

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] unique.matrix issue [Was: Anomaly with unique and match]

2011-03-10 Thread Terry Therneau

Simon pointed out that the issue I observed was due to internal
behaviour of unique.matrix.

  I had looked carefully at the manual pages before posting the question
and this was not mentioned.  Perhaps an addition could be made?

Terry T.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Anomaly with unique and match

2011-03-09 Thread Terry Therneau

I stumbled onto this working on an update to coxph.  The last 6 lines
below are the question, the rest create a test data set.

tmt585% R
R version 2.12.2 (2011-02-25)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

# Lines of code from survival/tests/singtest.R
 library(survival)
Loading required package: splines
 test1 - data.frame(time=  c(4, 3,1,1,2,2,3),
+ status=c(1,NA,1,0,1,1,0),
+ x= c(0, 2,1,1,1,0,0))
 
 temp - rep(0:3, rep(7,4))
 
 stest - data.frame(start  = 10*temp,
+ stop   = 10*temp + test1$time,
+ status = rep(test1$status,4),
+ x  = c(test1$x+ 1:7, rep(test1$x,3)),
+ epoch  = rep(1:4, rep(7,4)))
 
 fit1 - coxph(Surv(start, stop, status) ~ x * factor(epoch), stest)

## New lines
 temp1 - fit1$linear.predictor
 temp2 - as.matrix(temp1)
 match(temp1, unique(temp1))
 [1] 1 2 3 4 4 5 6 7 7 7 6 6 6 8 8 8 6 6 6 9 9 9 6 6
 match(temp2, unique(temp2))
 [1]  1  2  3  4  4  5  6  7  7  7  6  6  6 NA NA NA  6  6  6  8  8  8
6  6

---

 I've solved it for my code by not calling match on a 1 column vector.  
 In general, however, should I be using some other paradym for this map
to unique operation?  For example match(as.character(x),
unique(as.character(x)) ?

Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] print(...,digits=2) behavior

2011-03-08 Thread Terry Therneau

 This affects _many_ *.Rout.save checks in packages.

  I assume this is in the R-devel branch.

 I've got an addition to survival nearly ready to go (faster concordance
calculation).  At what point should I should I switch over to the newer
version, fix up my .out files etc, to best mesh with the automatic
checks on CRAN?  

   It's a nuisance for me to update, but only a nuisance.  I've been
through it twice before (Splus version 4- 5 and Splus - R switch).  I
might as well time it so as to make life as easy as possible for you
folks. Thanks for all the hard work.

Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] function local to a fit

2011-03-08 Thread Terry Therneau

  I've added a time-transform ability to coxph: here is an example

fit - coxph(Surv(time, status) ~ age + tt(age) + sex, data=lung,
tt=function(x, t, ...) x*log(t) )

The only role for tt() in the formula is to be noticed as a specials by
terms().  I currently have tt defined as a function
tt - function(x) 
It has to be exported in the namespace, documented, etc.

Is there a way to make tt() local to coxph, but still be found by
model.frame, so that it does not have to be global?   It would seem to
be neater to do a marker transform like s() in gam, cluster() in coxph,
etc in this way, where the meaning is local to the enclosing function.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Anomaly in [.terms

2011-02-21 Thread Terry Therneau

 This arose when working on an addition to coxph, which has the features
that the X matrix never has an intercept column, and we remove strata()
terms before computing an X matrix.  The surprise: when a terms object
is subset the intercept attribute is turned back on.
  My lines 2 and 3 below were being executed just before a call to
model.frame.  The simple solution was of course to do them in the
opposite order so I am not waiting on a fix. 
  Not to mention that I am not sure a fix is required, though I was
surprised. 
Terry T.


tmt1131% R

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

 test - terms(Surv(time, status) ~ age + strata(ph.ecog),
+specials='strata')

 attr(test, 'intercept') - 0  #turn off intercept
 test - test[-2]   #remove strata

 test
Surv(time, status) ~ age
attr(,variables)
list(Surv(time, status), age)
attr(,factors)
   age
Surv(time, status)   0
age  1
attr(,term.labels)
[1] age
attr(,specials)
attr(,specials)$strata
NULL

attr(,order)
[1] 1
attr(,intercept)
[1] 1
attr(,response)
[1] 1

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rd2pdf error in R12.0

2011-02-17 Thread Terry Therneau

 On the local machine the command R11 CMD Rd2pdf survfit.Rd works fine. 
   R12 CMD Rd2pdf survfit.Rd fails with the message below.

Converting Rd files to LaTeX ...
  survfit.Rd
Creating pdf output from LaTeX ...
Error in texi2dvi(Rd2.tex, pdf = (out_ext == pdf), quiet =
FALSE,  : 
  Running 'texi2dvi' on 'Rd2.tex' failed.
Messages:
sh: texi2dvi: command not found
Output:

Error in running tools::texi2dvi

---
Here is the header when invoking the newer version:

R version 2.12.0 (2010-10-15)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

The R11 call is 2.11.0, same machine, same login session.


 I've looked in the manuals and didn't see anything specific about
version 12.  Our system administrator will want some hints about what to
do to fix this, ere I complain.  I discovered it running CMD check on a
package update.

Any pointers?

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] news.Rd format

2011-01-22 Thread Terry Therneau

  I'm converting the Changelog files that I have used in the survival package
(since the 1980s) to the inst/NEWS.Rd format and a couple of things are not 
clear from the help page.
  1. What should I use for the name: NEWS or survival?
  2. My section headers look like
\section{Changes in version 2.36-3}{
  \itemize{  
  etc
and I get cannot extract version info from the following section titles for 
all of them.  I must be missing something simple.

  Perhaps these two points could be clarified further in the manual page.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 >

1 - 100 of 148 matches

Mail list logo