>>>>> Gustavo Zapata Wainberg >>>>> on Mon, 3 May 2021 20:48:49 +0200 writes:
> Hi! > I'm wrinting this post because there is an inconsistency > when median() is calculated for even or odd vectors. For > odd vectors, attributes (such as labels added with Hmisc) > are kept after running median(), but this is not the case > if the vector is even, in this last case attributes are > lost. > I know that this is due to median() using mean() to obtain > the result when the vector is even, and mean() always > takes attributes off vectors. Yes, and this has been the design of median() for ever : If n := length(x) is odd, the median is "the middle" observation, and should equal to x[j] for j = (n+1)/2 and hence e.g., is well defined for an ordered factor. When n is even however, median() must be the mean of "the two middle" observations, which is e.g., not even *defined* for an ordered factor. We *could* talk of the so called lo-median or hi-median (terms probably coined by John W. Tukey) because (IIRC), these are equal to each other and to the median for odd n, but are equal to x[j] and x[j+1] j=n/2 for even n *and* are still "of the same kind" as x[] itself. Interestingly, for the mad() { = the median absolute deviation from the median} we *do* allow to specify logical 'low' and 'high', but that for the "outer" median in MAD's definition, not the inner one. ## From <Rsrc>/src/library/stats/R/mad.R : mad <- function(x, center = median(x), constant = 1.4826, na.rm = FALSE, low = FALSE, high = FALSE) { if(na.rm) x <- x[!is.na(x)] n <- length(x) constant * if((low || high) && n%%2 == 0) { if(low && high) stop("'low' and 'high' cannot be both TRUE") n2 <- n %/% 2 + as.integer(high) sort(abs(x - center), partial = n2)[n2] } else median(abs(x - center)) } > Don't you think that attributes should be kept in both > cases? well, not all attributes can be kept. Note that for *named* vectors x, x[j] can (and does) keep the name, but there's definitely no sensible name to give to (x[j] + x[j+1])/2 I'm willing to collaborate with some, considering to extend median.default() making hi-median and lo-median available to the user. Both of these will always return x[j] for some j and hence keep all (sensible!) attributes (well, if the `[`-method for the corresponding class has been defined correctly; I've encountered quite a few cases where people created vector-like classes but did not provide a "correct" subsetting method (typically you should make sure both a `[[` and `[` method works!). Best regards, Martin Martin Maechler ETH Zurich and R Core team > And, going further, shouldn't mean() keep > attributes as well? I have looked in R's Bugzilla and I > didn't find an entry related to this issue. > Please, let me know if you consider that this issue should > be posted in R's bugzilla. > Here is an example with code. > rndvar <- rnorm(n = 100) > Hmisc::label(rndvar) <- "A label for RNDVAR" > str(median(rndvar[-c(1,2)])) > Returns: "num 0.0368" > str(median(rndvar[-1])) > Returns: 'labelled' num 0.0322 - attr(*, "label")= chr "A > label for RNDVAR" > Thanks in advance! > Gustavo Zapata-Wainberg > [[alternative HTML version deleted]] > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel