On Fri, 6 Mar 2009, Greg Snow wrote:

I like the idea of median and friends working on ordered factors. Just a couple of thoughts on possible implementations.

Adding extra checks and functionality will slow down the function. For a single evaluation on a given dataset this slowdown will not be noticeable, but inside of a simulation, bootstrap, or other high iteration technique, it could matter. I would suggest creating a core function that does just the calculations (median, quantile, iqr) assuming that the data passed in is correct without doing any checks or anything fancy. Then the user callable function (median et. al.) would do the checks dispatch to other functions for anything fancy, etc. then call the core function with the clean data. The common user would not really notice a difference, but someone programming a high iteration technique could clean the data themselves, then call the core function directly bypassing the checks/branches.

Since median and quantile are already generic, adding a 'ordered' method would be zero cost to other uses. And the factor check at the head of median.default could be replaced by median.factor if someone could show a convincing performance difference.

Just out of curiosity (from someone who only learned from English (Americanized at that) and not Italian texts), what would the median of [Low, Low, Medium, High] be?

I don't think it is 'the' median but 'a' median. (Even English Wikipedia says the median is not unique for even numbers of inputs.)


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


-----Original Message-----
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
project.org] On Behalf Of Simone Giannerini
Sent: Thursday, March 05, 2009 4:49 PM
To: R-devel
Subject: [Rd] quantile(), IQR() and median() for factors

Dear all,

from the help page of quantile:

"x     numeric vectors whose sample quantiles are wanted. Missing
values are ignored."

from the help page of IQR:

"x     a numeric vector."

as a matter of facts it seems that both quantile() and IQR() do not
check for the presence of a numeric input.
See the following:

set.seed(11)
x <- rbinom(n=11,size=2,prob=.5)
x <- factor(x,ordered=TRUE)
x
 [1] 1 0 1 0 0 2 0 1 2 0 0
Levels: 0 < 1 < 2

quantile(x)
  0%  25%  50%  75% 100%
   0 <NA>    0 <NA>    2
Levels: 0 < 1 < 2
Warning messages:
1: In Ops.ordered((1 - h), qs[i]) :
  '*' is not meaningful for ordered factors
2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
factors

IQR(x)
[1] 1

whereas median has the check:

median(x)
Error in median.default(x) : need numeric data

I also take the opportunity to ask your comments on the following
related subject:

In my opinion it would be convenient that median() and the like
(quantile(), IQR()) be implemented for ordered factors for which in
fact
they can be well defined. For instance, in this way functions like
apply(x,FUN=median,...) could be used without the need of further
processing for
data frames that contain both numeric variables and ordered factors.
If on the one hand, to my limited knowledge, in English introductory
statistics
textbooks the fact that the median is well defined for ordered
categorical variables is only mentioned marginally,
on the other hand, in the Italian Statistics literature this is often
discussed in detail and this could mislead students and practitioners
that might
expect median() to work for ordered factors.

In this message

https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html

Martin Maechler considers the possibility of doing such a job by
allowing for extra arguments "low" and "high" as it is done for mad().
I am willing to give a contribution if requested, and comments are
welcome.

Thank you for the attention,

kind regards,

Simone

R.version
               _
platform       i386-pc-mingw32
arch           i386
os             mingw32
system         i386, mingw32
status
major          2
minor          8.1
year           2008
month          12
day            22
svn rev        47281
language       R
version.string R version 2.8.1 (2008-12-22)

 LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=
Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252

--
______________________________________________________

Simone Giannerini
Dipartimento di Scienze Statistiche "Paolo Fortunati"
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098262  Fax: +39 051 232153
http://www2.stat.unibo.it/giannerini/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to