Yes I have discussed right continuous, left continous, etc. definitions for the median in numeric data. I was just curious what the discussion was in texts that cover quantiles/medians of ordered categorical data in detail.
I do not expect Low.5 as computer output for the median (but Low.Medium does make sense in a way). Back in my theory classes when we actually needed a firm definition I remember using the left continuous mainly (Low for the example), but I don't remember why we chose that over the right continuous version, probably just the teachers/books preference (I do remember it made things simpler than using the average of the middle 2 when n was even). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Simone Giannerini [mailto:sgianner...@gmail.com] > Sent: Friday, March 06, 2009 2:08 PM > To: Prof Brian Ripley > Cc: Greg Snow; R-devel > Subject: Re: [Rd] quantile(), IQR() and median() for factors > > Dear Greg, > > thank you for your comments, > as Prof. Ripley pointed out, in the case of even sample size the > median is not unique and is formed by the two central observations or > a function of them, if that makes sense. > > > > Dear Prof. Ripley, > > thank you for your concern, > > may I notice that (in case of non-negative data) one can get the > median from mad() with center=0,constant=1 > > > > mad(1:10,center=0,constant=1) > [1] 5.5 > > mad(1:10,center=0,constant=1,high=TRUE) > [1] 6 > > mad(1:10,center=0,constant=1,low=TRUE) > [1] 5 > > so that it seems that part of the code of mad() might be a starting > point, at least for median(). > I confirm my availability to work on the matter if requested. > > Kind regards, > > Simone > > > On Fri, Mar 6, 2009 at 6:36 PM, Prof Brian Ripley > <rip...@stats.ox.ac.uk> wrote: > > On Fri, 6 Mar 2009, Greg Snow wrote: > > > >> I like the idea of median and friends working on ordered factors. > Just a > >> couple of thoughts on possible implementations. > >> > >> Adding extra checks and functionality will slow down the function. > For a > >> single evaluation on a given dataset this slowdown will not be > noticeable, > >> but inside of a simulation, bootstrap, or other high iteration > technique, it > >> could matter. I would suggest creating a core function that does > just the > >> calculations (median, quantile, iqr) assuming that the data passed > in is > >> correct without doing any checks or anything fancy. Then the user > callable > >> function (median et. al.) would do the checks dispatch to other > functions > >> for anything fancy, etc. then call the core function with the clean > data. > >> The common user would not really notice a difference, but someone > >> programming a high iteration technique could clean the data > themselves, then > >> call the core function directly bypassing the checks/branches. > > > > Since median and quantile are already generic, adding a 'ordered' > method > > would be zero cost to other uses. And the factor check at the head > of > > median.default could be replaced by median.factor if someone could > show a > > convincing performance difference. > > > >> Just out of curiosity (from someone who only learned from English > >> (Americanized at that) and not Italian texts), what would the median > of > >> [Low, Low, Medium, High] be? > > > > I don't think it is 'the' median but 'a' median. (Even English > Wikipedia > > says the median is not unique for even numbers of inputs.) > > > >> > >> -- > >> Gregory (Greg) L. Snow Ph.D. > >> Statistical Data Center > >> Intermountain Healthcare > >> greg.s...@imail.org > >> 801.408.8111 > >> > >> > >>> -----Original Message----- > >>> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r- > >>> project.org] On Behalf Of Simone Giannerini > >>> Sent: Thursday, March 05, 2009 4:49 PM > >>> To: R-devel > >>> Subject: [Rd] quantile(), IQR() and median() for factors > >>> > >>> Dear all, > >>> > >>> from the help page of quantile: > >>> > >>> "x numeric vectors whose sample quantiles are wanted. Missing > >>> values are ignored." > >>> > >>> from the help page of IQR: > >>> > >>> "x a numeric vector." > >>> > >>> as a matter of facts it seems that both quantile() and IQR() do not > >>> check for the presence of a numeric input. > >>> See the following: > >>> > >>> set.seed(11) > >>> x <- rbinom(n=11,size=2,prob=.5) > >>> x <- factor(x,ordered=TRUE) > >>> x > >>> [1] 1 0 1 0 0 2 0 1 2 0 0 > >>> Levels: 0 < 1 < 2 > >>> > >>>> quantile(x) > >>> > >>> 0% 25% 50% 75% 100% > >>> 0 <NA> 0 <NA> 2 > >>> Levels: 0 < 1 < 2 > >>> Warning messages: > >>> 1: In Ops.ordered((1 - h), qs[i]) : > >>> '*' is not meaningful for ordered factors > >>> 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered > >>> factors > >>> > >>>> IQR(x) > >>> > >>> [1] 1 > >>> > >>> whereas median has the check: > >>> > >>>> median(x) > >>> > >>> Error in median.default(x) : need numeric data > >>> > >>> I also take the opportunity to ask your comments on the following > >>> related subject: > >>> > >>> In my opinion it would be convenient that median() and the like > >>> (quantile(), IQR()) be implemented for ordered factors for which in > >>> fact > >>> they can be well defined. For instance, in this way functions like > >>> apply(x,FUN=median,...) could be used without the need of further > >>> processing for > >>> data frames that contain both numeric variables and ordered > factors. > >>> If on the one hand, to my limited knowledge, in English > introductory > >>> statistics > >>> textbooks the fact that the median is well defined for ordered > >>> categorical variables is only mentioned marginally, > >>> on the other hand, in the Italian Statistics literature this is > often > >>> discussed in detail and this could mislead students and > practitioners > >>> that might > >>> expect median() to work for ordered factors. > >>> > >>> In this message > >>> > >>> https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html > >>> > >>> Martin Maechler considers the possibility of doing such a job by > >>> allowing for extra arguments "low" and "high" as it is done for > mad(). > >>> I am willing to give a contribution if requested, and comments are > >>> welcome. > >>> > >>> Thank you for the attention, > >>> > >>> kind regards, > >>> > >>> Simone > >>> > >>>> R.version > >>> > >>> _ > >>> platform i386-pc-mingw32 > >>> arch i386 > >>> os mingw32 > >>> system i386, mingw32 > >>> status > >>> major 2 > >>> minor 8.1 > >>> year 2008 > >>> month 12 > >>> day 22 > >>> svn rev 47281 > >>> language R > >>> version.string R version 2.8.1 (2008-12-22) > >>> > >>> > LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY= > >>> Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252 > >>> > >>> -- > >>> ______________________________________________________ > >>> > >>> Simone Giannerini > >>> Dipartimento di Scienze Statistiche "Paolo Fortunati" > >>> Universita' di Bologna > >>> Via delle belle arti 41 - 40126 Bologna, ITALY > >>> Tel: +39 051 2098262 Fax: +39 051 232153 > >>> http://www2.stat.unibo.it/giannerini/ > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > -- > > Brian D. Ripley, rip...@stats.ox.ac.uk > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) > > 1 South Parks Road, +44 1865 272866 (PA) > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > -- > ______________________________________________________ > > Simone Giannerini > Dipartimento di Scienze Statistiche "Paolo Fortunati" > Universita' di Bologna > Via delle belle arti 41 - 40126 Bologna, ITALY > Tel: +39 051 2098262 Fax: +39 051 232153 > http://www2.stat.unibo.it/giannerini/ > ______________________________________________________ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel