>>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>> on Fri, 29 Apr 2011 16:25:09 +0200 writes:
>>>>> Paul Johnson <pauljoh...@gmail.com> >>>>> on Thu, 28 Apr 2011 00:20:27 -0500 writes: >> On Wed, Apr 27, 2011 at 12:44 PM, Patrick Burns >> <pbu...@pburns.seanet.com> wrote: >>> Here are some data frames: >>> >>> df3.2 <- data.frame(1:3, 7:9) df4.2 <- data.frame(1:4, >>> 7:10) df3.3 <- data.frame(1:3, 7:9, 10:12) df4.3 <- >>> data.frame(1:4, 7:10, 10:13) df3.4 <- data.frame(1:3, >>> 7:9, 10:12, 15:17) df4.4 <- data.frame(1:4, 7:10, 10:13, >>> 15:18) >>> >>> Now here are some commands and their answers: >>>> median(df4.4) >>> [1] 8.5 11.5 >>>> median(df3.2[c(1,2,3),]) >>> [1] 2 8 >>>> median(df3.2[c(1,3,2),]) >>> [1] 2 NA Warning message: In mean.default(X[[2L]], ...) >>> : argument is not numeric or logical: returning NA >>> >>> >>> >>> The sessionInfo is below, but it looks to me like the >>> present behavior started in 2.10.0. >>> >>> Sometimes it gets the right answer. I'd be grateful to >>> hear how it does that -- I can't figure it out. >>> > Hello, Pat. >> Nice poetry there! I think I have an actual answer, as >> opposed to the usual crap I spew. >> I would agree if you said median.data.frame ought to be >> written to work columnwise, similar to mean.data.frame. >> apply and sapply always give the correct answer >>> apply(df3.3, 2, median) >> X1.3 X7.9 X10.12 2 8 11 > [...........] > exactly >> mean.data.frame is now implemented as >> mean.data.frame <- function(x, ...) sapply(x, mean, ...) > exactly. > My personal oppinion is that mean.data.frame() should > never have been written. People should know, or learn, to > use apply functions for such a task. > The unfortunate fact that mean.data.frame() exists makes > people think that median.data.frame() should too, and then > var.data.frame() sd.data.frame() mad.data.frame() > min.data.frame() max.data.frame() ... ... > all just in order to *not* to have to know sapply() ???? > No, rather not. > My vote is for deprecating mean.data.frame(). > Martin This has now happened -- for R 2.14.0 and later. As raised in this thread in April, there's a similar "extra helpful" behavior within the sd() function, and we've also deprecated that. In addition -- getting back to Pat Burns' original post, I'm also proposing to change median(<data.frame>) such that it produces an error instead of the current "sometimes correct" (but mostly not!) results. Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel