On Tue, 6 Nov 2007, [EMAIL PROTECTED] wrote: > Unfortunately I think it would break too much existing code. tapply() > is an old function and many people have gotten used to the way it works > now.
It is also not necessarily desirable: FUN(numeric(0)) might be an error. For example: > Z <- data.frame(x=rnorm(10), f=rep(c("a", "b"), each=5))[1:5, ] > tapply(Z$x, Z$f, sd) but sd(numeric(0)) is an error. (Similar things involving var are 'in the wild' and so would be broken.) > This is not to suggest there could not be another argument added at the > end to indicate that you want the new behaviour, though. e.g. > > tapply <- function (X, INDEX, FUN=NULL, ..., simplify=TRUE, > handle.empty.levels = FALSE) > > but this raises the question of what sort of time penalty the > modification might entail. Probably not much for most situations, I > suppose. (I know this argument name looks long, but you do need a > fairly specific argument name, or it will start to impinge on the ... > argument.) > > Just some thoughts. > > Bill Venables. > > Bill Venables > CSIRO Laboratories > PO Box 120, Cleveland, 4163 > AUSTRALIA > Office Phone (email preferred): +61 7 3826 7251 > Fax (if absolutely necessary): +61 7 3826 7304 > Mobile: +61 4 8819 4402 > Home Phone: +61 7 3286 7700 > mailto:[EMAIL PROTECTED] > http://www.cmis.csiro.au/bill.venables/ > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Robinson > Sent: Tuesday, 6 November 2007 3:10 PM > To: R-Devel > Subject: [Rd] A suggestion for an amendment to tapply > > Dear R-developers, > > when tapply() is invoked on factors that have empty levels, it returns > NA. This behaviour is in accord with the tapply documentation, and is > reasonable in many cases. However, when FUN is sum, it would also > seem reasonable to return 0 instead of NA, because "the sum of an > empty set is zero, by definition." > > I'd like to raise a discussion of the possibility of an amendment to > tapply. > > The attached patch changes the function so that it checks if there are > any empty levels, and if there are, replaces the corresponding NA > values with the result of applying FUN to the empty set. Eg in the > case of sum, it replaces the NA with 0, whereas with mean, it replaces > the NA with NA, and issues a warning. > > This change has the following advantage: tapply and sum work better > together. Arguably, tapply and any other function that has a non-NA > response to the empty set will also work better together. > Furthermore, tapply shows a warning if FUN would normally show a > warning upon being evaluated on an empty set. That deviates from > current behaviour, which might be bad, but also provides information > that might be useful to the user, so that would be good. > > The attached script provides the new function in full, and > demonstrates its application in some simple test cases. > > Best wishes, > > Andrew > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel