Re: [Rd] array subsetting of S4 object that inherits from array
BB == Bradley Buchsbaum bbuchsb...@berkeley.edu on Thu, 5 Mar 2009 21:16:40 -0500 writes: BB Hi, BB I have an S4 class that inherits from array but does not add generic BB implementations of the [ method. BB A simplified example is: BB setClass(fooarray, contains=array) BB If I create a fooarray object and subset it with a one-dimensional BB index vector, the return value is of class fooarray. Other variants BB (see below), however, return primitive values consistent with BB ordinary array subsetting. BB x - new(fooarray, array(0,c(10,10,10))) BB class(x[1,1,1])# prints numeric BB class(x[1,,]) # prints matrix BB class(x[1]) # prints fooarray BB class(x[1:10])# prints fooarray BB This behavior seems to have been introduced in R2.8.1 as I have not BB encountered it before. I tested it on R.2.7.0 and confirmed that BB class(x[1]) returned numeric. BB In my case, the desired behavior is for array subsetting in all cases BB to return primitive data structures, so if there is a way to override BB the new behavior I would opt for that. Yes, the new behavior was introduced (into R 2.8.0) by me, and ... coincidence ?! ... two days ago, in e-talking with John Chambers, I have been convinced, that the new feature really has been a mis-feature. Consequentley, yesterday (!) I'v committed changes to both R-patched (2.8.1 patched) and R-devel which we revert the mis-feature. So, the override is to use 2.8.1 patched (or newer). I'm sorry for my thinko that may also affect other R-S4-programmers [of course I hope not, but then there's Murphy's law]. Regards, Martin Maechler, ETH Zurich BB Regards, BB Brad Buchsbaum BB R version 2.8.1 (2008-12-22) BB i386-pc-mingw32 BB locale: BB LC_COLLATE=English_United States.1252;LC_CTYPE=English_United BB States.1252;LC_MONETARY=English_United BB States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 BB attached base packages: BB [1] stats graphics grDevices utils datasets methods base BB -- BB Bradley R. Buchsbaum BB Rotman Research Institute BB 3560 Bathurst St. BB Toronto, ON Canada M6A 2E1 BB email: bbuchsb...@rotman-baycrest.on.ca BB __ BB R-devel@r-project.org mailing list BB https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [SoC09-Info] Application starts next week.
Hi everybody, next week is the week when mentoring organizations can apply for the Google Summer of Code. As I already wrote in my first mail, the idea is to submit our ideas by March 10. Currently three ideas are on the list[1]: * Development of crantastic.org by Hadley Wickham * Movement Ecology add-ons for adehabitat package by Damiano G. Preatoni * Party On! New Recursive Partytioning Tools by Torsten Hothorn and Achim Zeileis Don't hesitate to chip in other ideas; the more ideas are on the list the better it is for the application. BTW: Do mentors whose projects weren't realized last summer (see [2]) want to re-submit their projects? Best, Manuel. [1] http://www.r-project.org/soc09 [2] http://www.r-project.org/soc08/ideas.html __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bug (PR#13570)
Prof Brian Ripley wrote: On Thu, 5 Mar 2009, Benjamin Tyner wrote: [...] I submitted a bug fix to Eric Grosse, the maintainer of the netlib routines; the fixed lines of fortran are identified in the comments at (just search for my email address): http://www.netlib.org/a/loess These fixes would be relatively simple to incorporate into R's version of loessf.f The fixes from dloess even more simply, since R's code is based on dloess. Thank you for the suggestion. Given how tricky this is to reproduce, I went back to my example under valgrind. If I use the latest dloess code, it crashes, but by selectively importing some of the differences I can get it to work. So it looks as if we are on the road to a solution, but something in the current version (not necessarily in these changes) is incompatible with the current R code and I need to dig further (not for a few days). What a nice war story this is! Good that it now seems fixable; even though degree=0 is not of much practical use, it is the sort of thing people like to have available when explaining how the method works. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug in summary.aovlist() with split= and (PR#13579)
---62a8e378fd5c9332aae960888fd28459 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit # R for Windows will not send your bug report automatically. # Please copy the bug report (after finishing it) to # your favorite email program and send it to # # r-b...@r-project.org # ## summary.aovlist() with split= and expand.split=TRUE gives two different types of nonsensical results for a:b in the Within stratum in the two different expansions of tmp3.aov. S-Plus gives appropriate results and I attach them for comparison. There are three attached files. split.r source split.rt R transcript showing nonsense results split.st S-Plus transcript showing appropriate results Rich --please do not edit the information below-- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = major = 2 minor = 8.1 year = 2008 month = 12 day = 22 svn rev = 47281 language = R version.string = R version 2.8.1 (2008-12-22) Windows XP (build 2600) Service Pack 3 Locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 Search Path: .GlobalEnv, package:RcmdrPlugin.HH, package:Rcmdr, package:car, package:tcltk, package:fortunes, package:VGAM, package:stats4, package:splines, package:HH, package:leaps, package:multcomp, package:mvtnorm, package:grid, package:lattice, package:stats, package:graphics, package:datasets, package:grDevices, package:rcom, package:rscproxy, package:utils, package:methods, RExcelEnv, RcmdrEnv, Autoloads, package:base ---62a8e378fd5c9332aae960888fd28459 Content-Type: text/plain; name=split.r Content-Disposition: inline; filename=split.r Content-Transfer-Encoding: quoted-printable tmp - data.frame(y=3Drnorm(48), a=3Drep(letters[1:3]= , 16), b=3Drep(rep(LETTERS[1:4], each=3D3),4), = block=3Drep(LETTERS[5:6], each=3D24) ) = tmp.aov - aov(y ~ a*b, data=3Dtmp) summary(tmp.aov, split=3D= list(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x=3D3)), expan= d.split=3DTRUE) tmp2.aov - aov(y ~ Error(block) + a*b, data=3Dtmp) = summary(tmp2.aov, split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(= v=3D1,w=3D2,x=3D3)), expand.split=3DTRUE) summary(tmp2.aov, = split=3Dlist(a=3Dlist(t=3D1,u=3D2)), expand.split=3DTRUE= ) tmp3.aov - aov(y ~ Error(block/a) + a*b, data=3Dtmp) summary(tmp3= .aov, split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,= x=3D3)), expand.split=3DTRUE) summary(tmp3.aov, split= =3Dlist(a=3Dlist(t=3D1,u=3D2)), expand.split=3DTRUE) ---62a8e378fd5c9332aae960888fd28459 Content-Type: text/plain; name=split.rt Content-Disposition: inline; filename=split.rt Content-Transfer-Encoding: quoted-printable tmp - data.frame(y=3Drnorm(48), + a=3Drep(letters[= 1:3], 16), + b=3Drep(rep(LETTERS[1:4], each=3D3),4), + block=3Drep(LETTERS[5:6], each=3D24) + = ) = tmp.aov - aov(y ~ a*b, data=3Dtmp) summary(tmp.aov, + spl= it=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x=3D3)), + = expand.split=3DTRUE) Df Sum Sq Mean Sq F value Pr(F) a = 2 2.060 1.030 1.0528 0.3595 a: t 1 1.411 1.41= 1 1.4416 0.2377 a: u 1 0.650 0.650 0.6639 0.4205 b = 3 0.839 0.280 0.2859 0.8353 b: v 1 0.264 0.264 0.2= 702 0.6063 b: w 1 0.001 0.001 0.0013 0.9711 b: x 1= 0.573 0.573 0.5860 0.4489 a:b 6 2.300 0.383 0.3918 0.= 8794 a:b: t.v 1 0.556 0.556 0.5685 0.4558 a:b: u.v 1 0.99= 8 0.998 1.0203 0.3192 a:b: t.w 1 0.171 0.171 0.1747 0.6785 = a:b: u.w 1 0.092 0.092 0.0942 0.7607 a:b: t.x 1 0.361 0.= 361 0.3685 0.5476 a:b: u.x 1 0.122 0.122 0.1246 0.7261 Residu= als 36 35.226 0.978 = = tmp2.aov - aov(y ~ Error(block) + a*b, data=3Dtmp) summary(tmp2.ao= v, + split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x= =3D3)), + expand.split=3DTRUE) Error: block Df S= um Sq Mean Sq F value Pr(F) Residuals 1 0.57849 0.57849 = = Error: Within Df Sum Sq Mean Sq F value Pr(F) a = 2 2.060 1.030 1.0406 0.3639 a: t 1 1.411 1.411 1.4250 = 0.2406 a: u 1 0.650 0.650 0.6563 0.4234 b 3 0.83= 9 0.280 0.2826 0.8376 b: v 1 0.264 0.264 0.2671 0.6085 = b: w 1 0.001 0.001 0.0013 0.9713 b: x 1 0.573 0.573= 0.5793 0.4517 a:b 6 2.300 0.383 0.3873 0.8822 a:b: t.v= 1 0.556 0.556 0.5619 0.4585 a:b: u.v 1 0.998 0.998 1.0085 = 0.3222 a:b: t.w 1 0.171 0.171 0.1727 0.6803 a:b: u.w
Re: [Rd] quantile(), IQR() and median() for factors
I like the idea of median and friends working on ordered factors. Just a couple of thoughts on possible implementations. Adding extra checks and functionality will slow down the function. For a single evaluation on a given dataset this slowdown will not be noticeable, but inside of a simulation, bootstrap, or other high iteration technique, it could matter. I would suggest creating a core function that does just the calculations (median, quantile, iqr) assuming that the data passed in is correct without doing any checks or anything fancy. Then the user callable function (median et. al.) would do the checks dispatch to other functions for anything fancy, etc. then call the core function with the clean data. The common user would not really notice a difference, but someone programming a high iteration technique could clean the data themselves, then call the core function directly bypassing the checks/branches. Just out of curiosity (from someone who only learned from English (Americanized at that) and not Italian texts), what would the median of [Low, Low, Medium, High] be? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r- project.org] On Behalf Of Simone Giannerini Sent: Thursday, March 05, 2009 4:49 PM To: R-devel Subject: [Rd] quantile(), IQR() and median() for factors Dear all, from the help page of quantile: x numeric vectors whose sample quantiles are wanted. Missing values are ignored. from the help page of IQR: x a numeric vector. as a matter of facts it seems that both quantile() and IQR() do not check for the presence of a numeric input. See the following: set.seed(11) x - rbinom(n=11,size=2,prob=.5) x - factor(x,ordered=TRUE) x [1] 1 0 1 0 0 2 0 1 2 0 0 Levels: 0 1 2 quantile(x) 0% 25% 50% 75% 100% 0 NA 0 NA 2 Levels: 0 1 2 Warning messages: 1: In Ops.ordered((1 - h), qs[i]) : '*' is not meaningful for ordered factors 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered factors IQR(x) [1] 1 whereas median has the check: median(x) Error in median.default(x) : need numeric data I also take the opportunity to ask your comments on the following related subject: In my opinion it would be convenient that median() and the like (quantile(), IQR()) be implemented for ordered factors for which in fact they can be well defined. For instance, in this way functions like apply(x,FUN=median,...) could be used without the need of further processing for data frames that contain both numeric variables and ordered factors. If on the one hand, to my limited knowledge, in English introductory statistics textbooks the fact that the median is well defined for ordered categorical variables is only mentioned marginally, on the other hand, in the Italian Statistics literature this is often discussed in detail and this could mislead students and practitioners that might expect median() to work for ordered factors. In this message https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html Martin Maechler considers the possibility of doing such a job by allowing for extra arguments low and high as it is done for mad(). I am willing to give a contribution if requested, and comments are welcome. Thank you for the attention, kind regards, Simone R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.1 year 2008 month 12 day 22 svn rev 47281 language R version.string R version 2.8.1 (2008-12-22) LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY= Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252 -- __ Simone Giannerini Dipartimento di Scienze Statistiche Paolo Fortunati Universita' di Bologna Via delle belle arti 41 - 40126 Bologna, ITALY Tel: +39 051 2098262 Fax: +39 051 232153 http://www2.stat.unibo.it/giannerini/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] quantile(), IQR() and median() for factors
On Fri, 6 Mar 2009, Greg Snow wrote: I like the idea of median and friends working on ordered factors. Just a couple of thoughts on possible implementations. Adding extra checks and functionality will slow down the function. For a single evaluation on a given dataset this slowdown will not be noticeable, but inside of a simulation, bootstrap, or other high iteration technique, it could matter. I would suggest creating a core function that does just the calculations (median, quantile, iqr) assuming that the data passed in is correct without doing any checks or anything fancy. Then the user callable function (median et. al.) would do the checks dispatch to other functions for anything fancy, etc. then call the core function with the clean data. The common user would not really notice a difference, but someone programming a high iteration technique could clean the data themselves, then call the core function directly bypassing the checks/branches. Since median and quantile are already generic, adding a 'ordered' method would be zero cost to other uses. And the factor check at the head of median.default could be replaced by median.factor if someone could show a convincing performance difference. Just out of curiosity (from someone who only learned from English (Americanized at that) and not Italian texts), what would the median of [Low, Low, Medium, High] be? I don't think it is 'the' median but 'a' median. (Even English Wikipedia says the median is not unique for even numbers of inputs.) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r- project.org] On Behalf Of Simone Giannerini Sent: Thursday, March 05, 2009 4:49 PM To: R-devel Subject: [Rd] quantile(), IQR() and median() for factors Dear all, from the help page of quantile: x numeric vectors whose sample quantiles are wanted. Missing values are ignored. from the help page of IQR: x a numeric vector. as a matter of facts it seems that both quantile() and IQR() do not check for the presence of a numeric input. See the following: set.seed(11) x - rbinom(n=11,size=2,prob=.5) x - factor(x,ordered=TRUE) x [1] 1 0 1 0 0 2 0 1 2 0 0 Levels: 0 1 2 quantile(x) 0% 25% 50% 75% 100% 0 NA 0 NA 2 Levels: 0 1 2 Warning messages: 1: In Ops.ordered((1 - h), qs[i]) : '*' is not meaningful for ordered factors 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered factors IQR(x) [1] 1 whereas median has the check: median(x) Error in median.default(x) : need numeric data I also take the opportunity to ask your comments on the following related subject: In my opinion it would be convenient that median() and the like (quantile(), IQR()) be implemented for ordered factors for which in fact they can be well defined. For instance, in this way functions like apply(x,FUN=median,...) could be used without the need of further processing for data frames that contain both numeric variables and ordered factors. If on the one hand, to my limited knowledge, in English introductory statistics textbooks the fact that the median is well defined for ordered categorical variables is only mentioned marginally, on the other hand, in the Italian Statistics literature this is often discussed in detail and this could mislead students and practitioners that might expect median() to work for ordered factors. In this message https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html Martin Maechler considers the possibility of doing such a job by allowing for extra arguments low and high as it is done for mad(). I am willing to give a contribution if requested, and comments are welcome. Thank you for the attention, kind regards, Simone R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.1 year 2008 month 12 day 22 svn rev 47281 language R version.string R version 2.8.1 (2008-12-22) LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY= Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252 -- __ Simone Giannerini Dipartimento di Scienze Statistiche Paolo Fortunati Universita' di Bologna Via delle belle arti 41 - 40126 Bologna, ITALY Tel: +39 051 2098262 Fax: +39 051 232153 http://www2.stat.unibo.it/giannerini/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University
Re: [Rd] quantile(), IQR() and median() for factors
Dear Greg, thank you for your comments, as Prof. Ripley pointed out, in the case of even sample size the median is not unique and is formed by the two central observations or a function of them, if that makes sense. Dear Prof. Ripley, thank you for your concern, may I notice that (in case of non-negative data) one can get the median from mad() with center=0,constant=1 mad(1:10,center=0,constant=1) [1] 5.5 mad(1:10,center=0,constant=1,high=TRUE) [1] 6 mad(1:10,center=0,constant=1,low=TRUE) [1] 5 so that it seems that part of the code of mad() might be a starting point, at least for median(). I confirm my availability to work on the matter if requested. Kind regards, Simone On Fri, Mar 6, 2009 at 6:36 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Fri, 6 Mar 2009, Greg Snow wrote: I like the idea of median and friends working on ordered factors. Just a couple of thoughts on possible implementations. Adding extra checks and functionality will slow down the function. For a single evaluation on a given dataset this slowdown will not be noticeable, but inside of a simulation, bootstrap, or other high iteration technique, it could matter. I would suggest creating a core function that does just the calculations (median, quantile, iqr) assuming that the data passed in is correct without doing any checks or anything fancy. Then the user callable function (median et. al.) would do the checks dispatch to other functions for anything fancy, etc. then call the core function with the clean data. The common user would not really notice a difference, but someone programming a high iteration technique could clean the data themselves, then call the core function directly bypassing the checks/branches. Since median and quantile are already generic, adding a 'ordered' method would be zero cost to other uses. And the factor check at the head of median.default could be replaced by median.factor if someone could show a convincing performance difference. Just out of curiosity (from someone who only learned from English (Americanized at that) and not Italian texts), what would the median of [Low, Low, Medium, High] be? I don't think it is 'the' median but 'a' median. (Even English Wikipedia says the median is not unique for even numbers of inputs.) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r- project.org] On Behalf Of Simone Giannerini Sent: Thursday, March 05, 2009 4:49 PM To: R-devel Subject: [Rd] quantile(), IQR() and median() for factors Dear all, from the help page of quantile: x numeric vectors whose sample quantiles are wanted. Missing values are ignored. from the help page of IQR: x a numeric vector. as a matter of facts it seems that both quantile() and IQR() do not check for the presence of a numeric input. See the following: set.seed(11) x - rbinom(n=11,size=2,prob=.5) x - factor(x,ordered=TRUE) x [1] 1 0 1 0 0 2 0 1 2 0 0 Levels: 0 1 2 quantile(x) 0% 25% 50% 75% 100% 0 NA 0 NA 2 Levels: 0 1 2 Warning messages: 1: In Ops.ordered((1 - h), qs[i]) : '*' is not meaningful for ordered factors 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered factors IQR(x) [1] 1 whereas median has the check: median(x) Error in median.default(x) : need numeric data I also take the opportunity to ask your comments on the following related subject: In my opinion it would be convenient that median() and the like (quantile(), IQR()) be implemented for ordered factors for which in fact they can be well defined. For instance, in this way functions like apply(x,FUN=median,...) could be used without the need of further processing for data frames that contain both numeric variables and ordered factors. If on the one hand, to my limited knowledge, in English introductory statistics textbooks the fact that the median is well defined for ordered categorical variables is only mentioned marginally, on the other hand, in the Italian Statistics literature this is often discussed in detail and this could mislead students and practitioners that might expect median() to work for ordered factors. In this message https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html Martin Maechler considers the possibility of doing such a job by allowing for extra arguments low and high as it is done for mad(). I am willing to give a contribution if requested, and comments are welcome. Thank you for the attention, kind regards, Simone R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 8.1 year 2008 month 12 day 22 svn rev 47281 language R
Re: [Rd] quantile(), IQR() and median() for factors
Yes I have discussed right continuous, left continous, etc. definitions for the median in numeric data. I was just curious what the discussion was in texts that cover quantiles/medians of ordered categorical data in detail. I do not expect Low.5 as computer output for the median (but Low.Medium does make sense in a way). Back in my theory classes when we actually needed a firm definition I remember using the left continuous mainly (Low for the example), but I don't remember why we chose that over the right continuous version, probably just the teachers/books preference (I do remember it made things simpler than using the average of the middle 2 when n was even). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Simone Giannerini [mailto:sgianner...@gmail.com] Sent: Friday, March 06, 2009 2:08 PM To: Prof Brian Ripley Cc: Greg Snow; R-devel Subject: Re: [Rd] quantile(), IQR() and median() for factors Dear Greg, thank you for your comments, as Prof. Ripley pointed out, in the case of even sample size the median is not unique and is formed by the two central observations or a function of them, if that makes sense. Dear Prof. Ripley, thank you for your concern, may I notice that (in case of non-negative data) one can get the median from mad() with center=0,constant=1 mad(1:10,center=0,constant=1) [1] 5.5 mad(1:10,center=0,constant=1,high=TRUE) [1] 6 mad(1:10,center=0,constant=1,low=TRUE) [1] 5 so that it seems that part of the code of mad() might be a starting point, at least for median(). I confirm my availability to work on the matter if requested. Kind regards, Simone On Fri, Mar 6, 2009 at 6:36 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Fri, 6 Mar 2009, Greg Snow wrote: I like the idea of median and friends working on ordered factors. Just a couple of thoughts on possible implementations. Adding extra checks and functionality will slow down the function. For a single evaluation on a given dataset this slowdown will not be noticeable, but inside of a simulation, bootstrap, or other high iteration technique, it could matter. I would suggest creating a core function that does just the calculations (median, quantile, iqr) assuming that the data passed in is correct without doing any checks or anything fancy. Then the user callable function (median et. al.) would do the checks dispatch to other functions for anything fancy, etc. then call the core function with the clean data. The common user would not really notice a difference, but someone programming a high iteration technique could clean the data themselves, then call the core function directly bypassing the checks/branches. Since median and quantile are already generic, adding a 'ordered' method would be zero cost to other uses. And the factor check at the head of median.default could be replaced by median.factor if someone could show a convincing performance difference. Just out of curiosity (from someone who only learned from English (Americanized at that) and not Italian texts), what would the median of [Low, Low, Medium, High] be? I don't think it is 'the' median but 'a' median. (Even English Wikipedia says the median is not unique for even numbers of inputs.) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r- project.org] On Behalf Of Simone Giannerini Sent: Thursday, March 05, 2009 4:49 PM To: R-devel Subject: [Rd] quantile(), IQR() and median() for factors Dear all, from the help page of quantile: x numeric vectors whose sample quantiles are wanted. Missing values are ignored. from the help page of IQR: x a numeric vector. as a matter of facts it seems that both quantile() and IQR() do not check for the presence of a numeric input. See the following: set.seed(11) x - rbinom(n=11,size=2,prob=.5) x - factor(x,ordered=TRUE) x [1] 1 0 1 0 0 2 0 1 2 0 0 Levels: 0 1 2 quantile(x) 0% 25% 50% 75% 100% 0 NA 0 NA 2 Levels: 0 1 2 Warning messages: 1: In Ops.ordered((1 - h), qs[i]) : '*' is not meaningful for ordered factors 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered factors IQR(x) [1] 1 whereas median has the check: median(x) Error in median.default(x) : need numeric data I also take the opportunity to ask your comments on the following related subject: In my opinion it would be convenient that median() and the like (quantile(), IQR()) be implemented for ordered factors for which in fact they can be well defined. For instance, in this way functions like
[Rd] S4 objects for S3 methods
Some modifications have been committed for the r-devel version today that modify (essentially, correct a bug in) the communication of objects to an S3 method from an S4 class that extends the S3 class. This is one of a sequence of changes designed to make S4 classes work more generally and consistently with S3 methods and classes. In 2.8.0, support was provided for S4 classes that extend S3 classes, partly by making S3 method dispatch recognize the inheritance. The catch was that the S3 method would get the S4 object. Two problems with that: 1. The S3 method would fail if it tried to use the S3 class information directly, since the class attribute was the S4 class. 2. More seriously, if the method used the object, modified it and returned the result, it had a good chance of returning an invalid object seeming to come from the S4 class. The modification to deal with this now delivers to the S3 method the inherited S3 object. (This turned out to be somewhat harder than the original change, since it impacts several pieces of internal code.) A revision of the function asS4() deals with similar concerns--see the documentation. The change does not affect default methods. It would be tempting to convert S4 objects for those, but some S3 generics attempt to deal with S4 objects, e.g., str(). A change to the primitives that dispatch methods is more plausible, but for the moment all that was added was more explicit error messages if a non-vector S4 object is supplied. For more information see the section on inheriting from non-S4 classes in the documentation ?Classes. It would be helpful if package maintainers would check this and previous changes by running their code against the r-devel version of R, before that becomes 2.9.0. Please report any new errors (provided, of course, that the same code works with 2.8.1). John __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fix for foreign package segfault on Solaris 10 Intel
Like a couple of other posters in the past year, I was seeing R 2.8.1 segfault in the foreign package on my Solaris 10 Intel system: library(foreign) *** caught segfault *** address fe1d5c70, cause 'invalid permissions' Traceback: 1: .C(spss_init, PACKAGE = foreign) 2: fun(...) This happened whether I built with gcc3, gcc4, or SunStudio 12. Using pstack I found that the code was crashing in avl_create(). Using truss I found that identically named functions in the Solaris /lib/libavl.so.1 library were being used instead of the AVL functions provided in avl.c in the foreign package. To verify, I replaced all of the avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I made this change, loading the foreign package caused no further problems. An alternative workaround was a hack involving symlinks and LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl functions are incompatible with the ones provided by the standard Sun library, this approach has other potential gotchas. FYI. Jeff __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel
Can you show us the output you get from building foreign, and explain how it comes to be linked against libavl? I get (SunStudio 12) cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o Rdbfwrite.o SASxport.o avl.o dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o sfm-read.o spss.o stataread.o and ldd library/foreign/libs/foreign.so reveals no dependencies (and the R binary is not linked against libavl either). I can see that linking against libavl could cause problems, but have no idea why that might be happening. On Fri, 6 Mar 2009, Jeff Long wrote: Like a couple of other posters in the past year, I was seeing R 2.8.1 segfault in the foreign package on my Solaris 10 Intel system: library(foreign) *** caught segfault *** address fe1d5c70, cause 'invalid permissions' Traceback: 1: .C(spss_init, PACKAGE = foreign) 2: fun(...) This happened whether I built with gcc3, gcc4, or SunStudio 12. Using pstack I found that the code was crashing in avl_create(). Using truss I found that identically named functions in the Solaris /lib/libavl.so.1 library were being used instead of the AVL functions provided in avl.c in the foreign package. To verify, I replaced all of the avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I made this change, loading the foreign package caused no further problems. An alternative workaround was a hack involving symlinks and LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl functions are incompatible with the ones provided by the standard Sun library, this approach has other potential gotchas. FYI. Jeff __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] question
[ivo welch wrote:] The syntax for returning multiple arguments does not strike me as particularly appealing. would it not possible to allow syntax like: f= function() { return( rnorm(10), rnorm(20) ) } (a,d$b) = f() FWIW, my own solution is to define a multi-assign operator: '%-%' - function( a, b){ # a must be of the form '{thing1;thing2;...}' a - as.list( substitute( a))[-1] e - sys.parent() stopifnot( length( b) == length( a)) for( i in seq_along( a)) eval( call( '-', a[[ i]], b[[i]]), envir=e) NULL } Then I can write {a;d$b} %-% f() Actually it should probably return b invisibly, so that it can be chained a la {c$e$f;g$h} %-% {a;d$b} %-% f() I haven't checked it exhaustively but it has done the job OK for me. The name '%-%' does already feature in one R package, can't remember which but it's to do with graph theory, so you might be better off calling it something else. I use the synonym %:=% which is closer to what I think R should have called its assignment operator in the first place ;) HTH Mark Bravington CSIRO Hobart Australia From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf Of Gabor Grothendieck [ggrothendi...@gmail.com] Sent: 06 March 2009 09:25 To: ivo welch Cc: r-devel@r-project.org Subject: Re: [Rd] question I posted this a few years ago (but found I never really had a need for it): http://tolstoy.newcastle.edu.au/R/help/04/06/1430.html On Thu, Mar 5, 2009 at 9:22 AM, ivo welch ivo...@gmail.com wrote: dear R developers: it is of course easy for a third party to make suggestions if this third party is both clueless and does not put in any work. with these caveats, let me suggest something. The syntax for returning multiple arguments does not strike me as particularly appealing. would it not possible to allow syntax like: f= function() { return( rnorm(10), rnorm(20) ) } (a,d$b) = f() this would just hide the list conversion and unconversion. yes, I know how to accomplish this with lists, but it does not seem pretty or natural. regards, /ivo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel
I built it several times with a variety of flags and compilers. Here's what was used for the gcc3 build: /opt/csw/gcc3/bin/gcc -std=gnu99 -G -L/opt/sfw/lib -L/opt/csw/lib -L/opt/local/lib -L/usr/apps/cdat32/NetCDF/lib -o foreign.so avl.o dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o Rdbfread.o Rdbfwrite.o R_systat.o SASxport.o sfm-read.o spss.o stataread.o -L/admin/users/jwlong/R/src/R-2.8.1/lib -lR On my system, the main R shared library shows a dependency on libavl, and hence so does foreign.so and the R binary: % ldd R libR.so = /opt/local/lib/R/lib/libR.so libRblas.so = /opt/local/lib/R/lib/libRblas.so libc.so.1 = /lib/libc.so.1 libg2c.so.0 = /opt/csw/lib/libg2c.so.0 libm.so.2 = /lib/libm.so.2 libintl.so.8 = /opt/csw/lib/libintl.so.8 libreadline.so.4 = /opt/csw/lib/libreadline.so.4 libncurses.so.5 = /opt/csw/lib/libncurses.so.5 libnsl.so.1 = /lib/libnsl.so.1 libsocket.so.1 =/lib/libsocket.so.1 libdl.so.1 =/lib/libdl.so.1 libiconv.so.2 = /opt/csw/lib/libiconv.so.2 libm.so.1 = /lib/libm.so.1 libgcc_s.so.1 = /opt/csw/lib/libgcc_s.so.1 libsec.so.1 = /lib/libsec.so.1 libmp.so.2 =/lib/libmp.so.2 libmd.so.1 =/lib/libmd.so.1 libscf.so.1 = /lib/libscf.so.1 libavl.so.1 = /lib/libavl.so.1 libdoor.so.1 = /lib/libdoor.so.1 libuutil.so.1 = /lib/libuutil.so.1 libgen.so.1 = /lib/libgen.so.1 Looking through these various shared libs, it looks like /lib/libsec.so.1 is the one that pulls in libavl. And libintl is what pulls in libsec. And R itself pulls in libintl. Jeff == At 11:20 PM + 3/6/09, Prof Brian Ripley wrote: Can you show us the output you get from building foreign, and explain how it comes to be linked against libavl? I get (SunStudio 12) cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o Rdbfwrite.o SASxport.o avl.o dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o sfm-read.o spss.o stataread.o and ldd library/foreign/libs/foreign.so reveals no dependencies (and the R binary is not linked against libavl either). I can see that linking against libavl could cause problems, but have no idea why that might be happening. On Fri, 6 Mar 2009, Jeff Long wrote: Like a couple of other posters in the past year, I was seeing R 2.8.1 segfault in the foreign package on my Solaris 10 Intel system: library(foreign) *** caught segfault *** address fe1d5c70, cause 'invalid permissions' Traceback: 1: .C(spss_init, PACKAGE = foreign) 2: fun(...) This happened whether I built with gcc3, gcc4, or SunStudio 12. Using pstack I found that the code was crashing in avl_create(). Using truss I found that identically named functions in the Solaris /lib/libavl.so.1 library were being used instead of the AVL functions provided in avl.c in the foreign package. To verify, I replaced all of the avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I made this change, loading the foreign package caused no further problems. An alternative workaround was a hack involving symlinks and LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl functions are incompatible with the ones provided by the standard Sun library, this approach has other potential gotchas. FYI. Jeff __ R-devel@r-project.org mailing list https:// stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http:// www. stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel
Interesting, thanks. So 1) This is a shared R library build (not the default, and AFAIR no one reporting this has mentioned that -- not you, for example) and 2) You have a third-party libintl. One solution would seem to be to ask R to use the libintl in the sources by (I think) --with-included-gettext . I'll look into modifying foreign, but the combination of 1) and 2) could apply to alomost any library on the system and hence any symbol name in any package. It's odd that you have -L/usr/apps/cdat32/NetCDF/lib in there, another potential cause for problems. On Fri, 6 Mar 2009, Jeff Long wrote: I built it several times with a variety of flags and compilers. Here's what was used for the gcc3 build: /opt/csw/gcc3/bin/gcc -std=gnu99 -G -L/opt/sfw/lib -L/opt/csw/lib -L/opt/local/lib -L/usr/apps/cdat32/NetCDF/lib -o foreign.so avl.o dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o Rdbfread.o Rdbfwrite.o R_systat.o SASxport.o sfm-read.o spss.o stataread.o -L/admin/users/jwlong/R/src/R-2.8.1/lib -lR On my system, the main R shared library shows a dependency on libavl, and hence so does foreign.so and the R binary: % ldd R libR.so = /opt/local/lib/R/lib/libR.so libRblas.so = /opt/local/lib/R/lib/libRblas.so libc.so.1 = /lib/libc.so.1 libg2c.so.0 = /opt/csw/lib/libg2c.so.0 libm.so.2 = /lib/libm.so.2 libintl.so.8 = /opt/csw/lib/libintl.so.8 libreadline.so.4 = /opt/csw/lib/libreadline.so.4 libncurses.so.5 = /opt/csw/lib/libncurses.so.5 libnsl.so.1 = /lib/libnsl.so.1 libsocket.so.1 =/lib/libsocket.so.1 libdl.so.1 =/lib/libdl.so.1 libiconv.so.2 = /opt/csw/lib/libiconv.so.2 libm.so.1 = /lib/libm.so.1 libgcc_s.so.1 = /opt/csw/lib/libgcc_s.so.1 libsec.so.1 = /lib/libsec.so.1 libmp.so.2 =/lib/libmp.so.2 libmd.so.1 =/lib/libmd.so.1 libscf.so.1 = /lib/libscf.so.1 libavl.so.1 = /lib/libavl.so.1 libdoor.so.1 = /lib/libdoor.so.1 libuutil.so.1 = /lib/libuutil.so.1 libgen.so.1 = /lib/libgen.so.1 Looking through these various shared libs, it looks like /lib/libsec.so.1 is the one that pulls in libavl. And libintl is what pulls in libsec. And R itself pulls in libintl. Jeff == At 11:20 PM + 3/6/09, Prof Brian Ripley wrote: Can you show us the output you get from building foreign, and explain how it comes to be linked against libavl? I get (SunStudio 12) cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o Rdbfwrite.o SASxport.o avl.o dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o sfm-read.o spss.o stataread.o and ldd library/foreign/libs/foreign.so reveals no dependencies (and the R binary is not linked against libavl either). I can see that linking against libavl could cause problems, but have no idea why that might be happening. On Fri, 6 Mar 2009, Jeff Long wrote: Like a couple of other posters in the past year, I was seeing R 2.8.1 segfault in the foreign package on my Solaris 10 Intel system: library(foreign) *** caught segfault *** address fe1d5c70, cause 'invalid permissions' Traceback: 1: .C(spss_init, PACKAGE = foreign) 2: fun(...) This happened whether I built with gcc3, gcc4, or SunStudio 12. Using pstack I found that the code was crashing in avl_create(). Using truss I found that identically named functions in the Solaris /lib/libavl.so.1 library were being used instead of the AVL functions provided in avl.c in the foreign package. To verify, I replaced all of the avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I made this change, loading the foreign package caused no further problems. An alternative workaround was a hack involving symlinks and LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl functions are incompatible with the ones provided by the standard Sun library, this approach has other potential gotchas. FYI. Jeff __ R-devel@r-project.org mailing list https:// stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http:// www. stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org