Re: [Rd] array subsetting of S4 object that inherits from array

2009-03-06 Thread Martin Maechler
 BB == Bradley Buchsbaum bbuchsb...@berkeley.edu
 on Thu, 5 Mar 2009 21:16:40 -0500 writes:

BB Hi,
BB I have an S4 class that inherits from array but does not add generic
BB implementations of the [ method.

BB A simplified example is:

BB setClass(fooarray, contains=array)

BB If I create a fooarray object and subset it with a one-dimensional
BB index vector, the return value is of class fooarray. Other variants
BB (see below), however, return primitive values consistent with
BB ordinary array subsetting.

BB x - new(fooarray, array(0,c(10,10,10)))

BB class(x[1,1,1])# prints numeric
BB class(x[1,,])   # prints matrix
BB class(x[1]) #  prints fooarray
BB class(x[1:10])#  prints fooarray


BB This behavior seems to have been introduced in R2.8.1 as I have not
BB encountered it before. I tested it on R.2.7.0 and confirmed that
BB class(x[1]) returned numeric.

BB In my case, the desired behavior is for array subsetting in all cases
BB to return primitive data structures, so if there is a way to override
BB the new behavior I would opt for that.

Yes,  the new behavior was introduced (into R 2.8.0) by me,
and ... coincidence ?! ...  two days ago, in e-talking with John
Chambers, I have been convinced, that the new feature really has
been a mis-feature.  Consequentley, yesterday (!) I'v committed
changes to both R-patched (2.8.1 patched) and R-devel which we
revert the mis-feature.

So, the override is to use  2.8.1 patched (or newer).

I'm sorry for my thinko that may also affect other
R-S4-programmers [of course I hope not, but then there's
Murphy's law].

Regards,
Martin Maechler, ETH Zurich




BB Regards,

BB Brad Buchsbaum

BB R version 2.8.1 (2008-12-22)
BB i386-pc-mingw32

BB locale:
BB LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
BB States.1252;LC_MONETARY=English_United
BB States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

BB attached base packages:
BB [1] stats graphics  grDevices utils datasets  methods   base



BB -- 
BB Bradley R. Buchsbaum
BB Rotman Research Institute
BB 3560 Bathurst St.
BB Toronto, ON Canada M6A 2E1
BB email: bbuchsb...@rotman-baycrest.on.ca

BB __
BB R-devel@r-project.org mailing list
BB https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [SoC09-Info] Application starts next week.

2009-03-06 Thread Manuel J. A. Eugster

Hi everybody,

next week is the week when mentoring organizations can
apply for the Google Summer of Code. As I already wrote
in my first mail, the idea is to submit our ideas by
March 10.

Currently three ideas are on the list[1]:

   * Development of crantastic.org
 by Hadley Wickham

   * Movement Ecology add-ons for adehabitat package
 by Damiano G. Preatoni

   * Party On! New Recursive Partytioning Tools
 by Torsten Hothorn and Achim Zeileis

Don't hesitate to chip in other ideas; the more ideas
are on the list the better it is for the application.

BTW: Do mentors whose projects weren't realized last
summer (see [2]) want to re-submit their projects?



Best,

Manuel.


[1] http://www.r-project.org/soc09
[2] http://www.r-project.org/soc08/ideas.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] bug (PR#13570)

2009-03-06 Thread Peter Dalgaard
Prof Brian Ripley wrote:
 On Thu, 5 Mar 2009, Benjamin Tyner wrote:
[...]

 I submitted a bug fix to Eric Grosse, the maintainer of the netlib
 routines; the fixed lines of fortran are identified in the comments at
 (just search for my email address):

 http://www.netlib.org/a/loess

 These fixes would be relatively simple to incorporate into R's version
 of loessf.f
 
 The fixes from dloess even more simply, since R's code is based on
 dloess.  Thank you for the suggestion.
 
 Given how tricky this is to reproduce, I went back to my example under
 valgrind.  If I use the latest dloess code, it crashes, but by
 selectively importing some of the differences I can get it to work.
 
 So it looks as if we are on the road to a solution, but something in the
 current version (not necessarily in these changes) is incompatible with
 the current R code and I need to dig further (not for a few days).

What a nice war story this is!

Good that it now seems fixable; even though degree=0 is not of much
practical use, it is the sort of thing people like to have available
when explaining how the method works.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] bug in summary.aovlist() with split= and (PR#13579)

2009-03-06 Thread rmh

---62a8e378fd5c9332aae960888fd28459
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

# R for Windows will not send your bug report automatically.
# Please copy the bug report (after finishing it) to
# your favorite email program and send it to
#
#   r-b...@r-project.org
#

##


summary.aovlist() with split= and expand.split=TRUE
gives two different types of nonsensical results for a:b
in the Within stratum in the two different expansions of tmp3.aov.

S-Plus gives appropriate results and I attach them for comparison.
There are three attached files.
 split.r   source
 split.rt  R transcript showing nonsense results
 split.st  S-Plus transcript showing appropriate results

Rich


--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 2
 minor = 8.1
 year = 2008
 month = 12
 day = 22
 svn rev = 47281
 language = R
 version.string = R version 2.8.1 (2008-12-22)

Windows XP (build 2600) Service Pack 3

Locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

Search Path:
 .GlobalEnv, package:RcmdrPlugin.HH, package:Rcmdr, package:car, package:tcltk, 
package:fortunes, 
package:VGAM, package:stats4, package:splines, package:HH, package:leaps, 
package:multcomp, 
package:mvtnorm, package:grid, package:lattice, package:stats, 
package:graphics, package:datasets, 
package:grDevices, package:rcom, package:rscproxy, package:utils, 
package:methods, RExcelEnv, 
RcmdrEnv, Autoloads, package:base

---62a8e378fd5c9332aae960888fd28459
Content-Type: text/plain;
name=split.r
Content-Disposition: inline;
filename=split.r
Content-Transfer-Encoding: quoted-printable

tmp - data.frame(y=3Drnorm(48),
  a=3Drep(letters[1:3]=
, 16),
  b=3Drep(rep(LETTERS[1:4], each=3D3),4),
 =
 block=3Drep(LETTERS[5:6], each=3D24)
  )
=

tmp.aov - aov(y ~ a*b, data=3Dtmp)
summary(tmp.aov,
split=3D=
list(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x=3D3)),
expan=
d.split=3DTRUE)

tmp2.aov - aov(y ~ Error(block) + a*b, data=3Dtmp)
=
summary(tmp2.aov,
split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(=
v=3D1,w=3D2,x=3D3)),
expand.split=3DTRUE)
summary(tmp2.aov,
 =
   split=3Dlist(a=3Dlist(t=3D1,u=3D2)),
expand.split=3DTRUE=
)

tmp3.aov - aov(y ~ Error(block/a) + a*b, data=3Dtmp)
summary(tmp3=
.aov,
split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,=
x=3D3)),
expand.split=3DTRUE)
summary(tmp3.aov,
split=
=3Dlist(a=3Dlist(t=3D1,u=3D2)),
expand.split=3DTRUE)

---62a8e378fd5c9332aae960888fd28459
Content-Type: text/plain;
name=split.rt
Content-Disposition: inline;
filename=split.rt
Content-Transfer-Encoding: quoted-printable

 tmp - data.frame(y=3Drnorm(48),
+   a=3Drep(letters[=
1:3], 16),
+   b=3Drep(rep(LETTERS[1:4], each=3D3),4),
+   block=3Drep(LETTERS[5:6], each=3D24)
+ =
  )
 =

 tmp.aov - aov(y ~ a*b, data=3Dtmp)
 summary(tmp.aov,
+ spl=
it=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x=3D3)),
+   =
  expand.split=3DTRUE)
Df Sum Sq Mean Sq F value Pr(F)
a =
   2  2.060   1.030  1.0528 0.3595
  a: t   1  1.411   1.41=
1  1.4416 0.2377
  a: u   1  0.650   0.650  0.6639 0.4205
b   =
 3  0.839   0.280  0.2859 0.8353
  b: v   1  0.264   0.264  0.2=
702 0.6063
  b: w   1  0.001   0.001  0.0013 0.9711
  b: x   1=
  0.573   0.573  0.5860 0.4489
a:b  6  2.300   0.383  0.3918 0.=
8794
  a:b: t.v   1  0.556   0.556  0.5685 0.4558
  a:b: u.v   1  0.99=
8   0.998  1.0203 0.3192
  a:b: t.w   1  0.171   0.171  0.1747 0.6785
=
  a:b: u.w   1  0.092   0.092  0.0942 0.7607
  a:b: t.x   1  0.361   0.=
361  0.3685 0.5476
  a:b: u.x   1  0.122   0.122  0.1246 0.7261
Residu=
als   36 35.226   0.978   =

 =

 tmp2.aov - aov(y ~ Error(block) + a*b, data=3Dtmp)
 summary(tmp2.ao=
v,
+ split=3Dlist(a=3Dlist(t=3D1,u=3D2), b=3Dlist(v=3D1,w=3D2,x=
=3D3)),
+ expand.split=3DTRUE)

Error: block
  Df  S=
um Sq Mean Sq F value Pr(F)
Residuals  1 0.57849 0.57849  =
 =


Error: Within
   Df Sum Sq Mean Sq F value Pr(F)
a =
  2  2.060   1.030  1.0406 0.3639
  a: t  1  1.411   1.411  1.4250 =
0.2406
  a: u  1  0.650   0.650  0.6563 0.4234
b   3  0.83=
9   0.280  0.2826 0.8376
  b: v  1  0.264   0.264  0.2671 0.6085
 =
 b: w  1  0.001   0.001  0.0013 0.9713
  b: x  1  0.573   0.573=
  0.5793 0.4517
a:b 6  2.300   0.383  0.3873 0.8822
  a:b: t.v=
  1  0.556   0.556  0.5619 0.4585
  a:b: u.v  1  0.998   0.998  1.0085 =
0.3222
  a:b: t.w  1  0.171   0.171  0.1727 0.6803
  a:b: u.w 

Re: [Rd] quantile(), IQR() and median() for factors

2009-03-06 Thread Greg Snow
I like the idea of median and friends working on ordered factors.  Just a 
couple of thoughts on possible implementations.

Adding extra checks and functionality will slow down the function.  For a 
single evaluation on a given dataset this slowdown will not be noticeable, but 
inside of a simulation, bootstrap, or other high iteration technique, it could 
matter.  I would suggest creating a core function that does just the 
calculations (median, quantile, iqr) assuming that the data passed in is 
correct without doing any checks or anything fancy.  Then the user callable 
function (median et. al.) would do the checks dispatch to other functions for 
anything fancy, etc. then call the core function with the clean data.  The 
common user would not really notice a difference, but someone programming a 
high iteration technique could clean the data themselves, then call the core 
function directly bypassing the checks/branches.

Just out of curiosity (from someone who only learned from English (Americanized 
at that) and not Italian texts), what would the median of [Low, Low, Medium, 
High] be?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
 project.org] On Behalf Of Simone Giannerini
 Sent: Thursday, March 05, 2009 4:49 PM
 To: R-devel
 Subject: [Rd] quantile(), IQR() and median() for factors
 
 Dear all,
 
 from the help page of quantile:
 
 x     numeric vectors whose sample quantiles are wanted. Missing
 values are ignored.
 
 from the help page of IQR:
 
 x     a numeric vector.
 
 as a matter of facts it seems that both quantile() and IQR() do not
 check for the presence of a numeric input.
 See the following:
 
 set.seed(11)
 x - rbinom(n=11,size=2,prob=.5)
 x - factor(x,ordered=TRUE)
 x
  [1] 1 0 1 0 0 2 0 1 2 0 0
 Levels: 0  1  2
 
  quantile(x)
   0%  25%  50%  75% 100%
    0 NA    0 NA    2
 Levels: 0  1  2
 Warning messages:
 1: In Ops.ordered((1 - h), qs[i]) :
   '*' is not meaningful for ordered factors
 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
 factors
 
  IQR(x)
 [1] 1
 
 whereas median has the check:
 
  median(x)
 Error in median.default(x) : need numeric data
 
 I also take the opportunity to ask your comments on the following
 related subject:
 
 In my opinion it would be convenient that median() and the like
 (quantile(), IQR()) be implemented for ordered factors for which in
 fact
 they can be well defined. For instance, in this way functions like
 apply(x,FUN=median,...) could be used without the need of further
 processing for
 data frames that contain both numeric variables and ordered factors.
 If on the one hand, to my limited knowledge, in English introductory
 statistics
 textbooks the fact that the median is well defined for ordered
 categorical variables is only mentioned marginally,
 on the other hand, in the Italian Statistics literature this is often
 discussed in detail and this could mislead students and practitioners
 that might
 expect median() to work for ordered factors.
 
 In this message
 
 https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html
 
 Martin Maechler considers the possibility of doing such a job by
 allowing for extra arguments low and high as it is done for mad().
 I am willing to give a contribution if requested, and comments are
 welcome.
 
 Thank you for the attention,
 
 kind regards,
 
 Simone
 
  R.version
    _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  8.1
 year   2008
 month  12
 day    22
 svn rev    47281
 language   R
 version.string R version 2.8.1 (2008-12-22)
 
  LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=
 Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252
 
 --
 __
 
 Simone Giannerini
 Dipartimento di Scienze Statistiche Paolo Fortunati
 Universita' di Bologna
 Via delle belle arti 41 - 40126  Bologna,  ITALY
 Tel: +39 051 2098262  Fax: +39 051 232153
 http://www2.stat.unibo.it/giannerini/
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] quantile(), IQR() and median() for factors

2009-03-06 Thread Prof Brian Ripley

On Fri, 6 Mar 2009, Greg Snow wrote:

I like the idea of median and friends working on ordered factors. 
Just a couple of thoughts on possible implementations.


Adding extra checks and functionality will slow down the function. 
For a single evaluation on a given dataset this slowdown will not be 
noticeable, but inside of a simulation, bootstrap, or other high 
iteration technique, it could matter.  I would suggest creating a 
core function that does just the calculations (median, quantile, 
iqr) assuming that the data passed in is correct without doing any 
checks or anything fancy.  Then the user callable function (median 
et. al.) would do the checks dispatch to other functions for 
anything fancy, etc. then call the core function with the clean 
data.  The common user would not really notice a difference, but 
someone programming a high iteration technique could clean the data 
themselves, then call the core function directly bypassing the 
checks/branches.


Since median and quantile are already generic, adding a 'ordered' 
method would be zero cost to other uses.  And the factor check at the 
head of median.default could be replaced by median.factor if someone 
could show a convincing performance difference.


Just out of curiosity (from someone who only learned from English 
(Americanized at that) and not Italian texts), what would the median 
of [Low, Low, Medium, High] be?


I don't think it is 'the' median but 'a' median.  (Even English 
Wikipedia says the median is not unique for even numbers of inputs.)




--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
project.org] On Behalf Of Simone Giannerini
Sent: Thursday, March 05, 2009 4:49 PM
To: R-devel
Subject: [Rd] quantile(), IQR() and median() for factors

Dear all,

from the help page of quantile:

x     numeric vectors whose sample quantiles are wanted. Missing
values are ignored.

from the help page of IQR:

x     a numeric vector.

as a matter of facts it seems that both quantile() and IQR() do not
check for the presence of a numeric input.
See the following:

set.seed(11)
x - rbinom(n=11,size=2,prob=.5)
x - factor(x,ordered=TRUE)
x
 [1] 1 0 1 0 0 2 0 1 2 0 0
Levels: 0  1  2


quantile(x)

  0%  25%  50%  75% 100%
   0 NA    0 NA    2
Levels: 0  1  2
Warning messages:
1: In Ops.ordered((1 - h), qs[i]) :
  '*' is not meaningful for ordered factors
2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
factors


IQR(x)

[1] 1

whereas median has the check:


median(x)

Error in median.default(x) : need numeric data

I also take the opportunity to ask your comments on the following
related subject:

In my opinion it would be convenient that median() and the like
(quantile(), IQR()) be implemented for ordered factors for which in
fact
they can be well defined. For instance, in this way functions like
apply(x,FUN=median,...) could be used without the need of further
processing for
data frames that contain both numeric variables and ordered factors.
If on the one hand, to my limited knowledge, in English introductory
statistics
textbooks the fact that the median is well defined for ordered
categorical variables is only mentioned marginally,
on the other hand, in the Italian Statistics literature this is often
discussed in detail and this could mislead students and practitioners
that might
expect median() to work for ordered factors.

In this message

https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html

Martin Maechler considers the possibility of doing such a job by
allowing for extra arguments low and high as it is done for mad().
I am willing to give a contribution if requested, and comments are
welcome.

Thank you for the attention,

kind regards,

Simone


R.version

   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  8.1
year   2008
month  12
day    22
svn rev    47281
language   R
version.string R version 2.8.1 (2008-12-22)

 LC_COLLATE=Italian_Italy.1252;LC_CTYPE=Italian_Italy.1252;LC_MONETARY=
Italian_Italy.1252;LC_NUMERIC=C;LC_TIME=Italian_Italy.1252

--
__

Simone Giannerini
Dipartimento di Scienze Statistiche Paolo Fortunati
Universita' di Bologna
Via delle belle arti 41 - 40126  Bologna,  ITALY
Tel: +39 051 2098262  Fax: +39 051 232153
http://www2.stat.unibo.it/giannerini/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University 

Re: [Rd] quantile(), IQR() and median() for factors

2009-03-06 Thread Simone Giannerini
Dear Greg,

thank you for your comments,
as Prof. Ripley pointed out, in the case of even sample size the
median is not unique and is formed by the two central observations or
a function of them, if that makes sense.



Dear Prof. Ripley,

thank you for your concern,

may I notice that (in case of non-negative data) one can get the
median from mad() with center=0,constant=1


 mad(1:10,center=0,constant=1)
[1] 5.5
 mad(1:10,center=0,constant=1,high=TRUE)
[1] 6
 mad(1:10,center=0,constant=1,low=TRUE)
[1] 5

so that it seems that part of the code of mad() might be a starting
point, at least for median().
I confirm my availability to work on the matter if requested.

Kind regards,

Simone


On Fri, Mar 6, 2009 at 6:36 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On Fri, 6 Mar 2009, Greg Snow wrote:

 I like the idea of median and friends working on ordered factors. Just a
 couple of thoughts on possible implementations.

 Adding extra checks and functionality will slow down the function. For a
 single evaluation on a given dataset this slowdown will not be noticeable,
 but inside of a simulation, bootstrap, or other high iteration technique, it
 could matter.  I would suggest creating a core function that does just the
 calculations (median, quantile, iqr) assuming that the data passed in is
 correct without doing any checks or anything fancy.  Then the user callable
 function (median et. al.) would do the checks dispatch to other functions
 for anything fancy, etc. then call the core function with the clean data.
  The common user would not really notice a difference, but someone
 programming a high iteration technique could clean the data themselves, then
 call the core function directly bypassing the checks/branches.

 Since median and quantile are already generic, adding a 'ordered' method
 would be zero cost to other uses.  And the factor check at the head of
 median.default could be replaced by median.factor if someone could show a
 convincing performance difference.

 Just out of curiosity (from someone who only learned from English
 (Americanized at that) and not Italian texts), what would the median of
 [Low, Low, Medium, High] be?

 I don't think it is 'the' median but 'a' median.  (Even English Wikipedia
 says the median is not unique for even numbers of inputs.)


 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
 project.org] On Behalf Of Simone Giannerini
 Sent: Thursday, March 05, 2009 4:49 PM
 To: R-devel
 Subject: [Rd] quantile(), IQR() and median() for factors

 Dear all,

 from the help page of quantile:

 x     numeric vectors whose sample quantiles are wanted. Missing
 values are ignored.

 from the help page of IQR:

 x     a numeric vector.

 as a matter of facts it seems that both quantile() and IQR() do not
 check for the presence of a numeric input.
 See the following:

 set.seed(11)
 x - rbinom(n=11,size=2,prob=.5)
 x - factor(x,ordered=TRUE)
 x
  [1] 1 0 1 0 0 2 0 1 2 0 0
 Levels: 0  1  2

 quantile(x)

   0%  25%  50%  75% 100%
    0 NA    0 NA    2
 Levels: 0  1  2
 Warning messages:
 1: In Ops.ordered((1 - h), qs[i]) :
   '*' is not meaningful for ordered factors
 2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
 factors

 IQR(x)

 [1] 1

 whereas median has the check:

 median(x)

 Error in median.default(x) : need numeric data

 I also take the opportunity to ask your comments on the following
 related subject:

 In my opinion it would be convenient that median() and the like
 (quantile(), IQR()) be implemented for ordered factors for which in
 fact
 they can be well defined. For instance, in this way functions like
 apply(x,FUN=median,...) could be used without the need of further
 processing for
 data frames that contain both numeric variables and ordered factors.
 If on the one hand, to my limited knowledge, in English introductory
 statistics
 textbooks the fact that the median is well defined for ordered
 categorical variables is only mentioned marginally,
 on the other hand, in the Italian Statistics literature this is often
 discussed in detail and this could mislead students and practitioners
 that might
 expect median() to work for ordered factors.

 In this message

 https://stat.ethz.ch/pipermail/r-help/2003-November/042684.html

 Martin Maechler considers the possibility of doing such a job by
 allowing for extra arguments low and high as it is done for mad().
 I am willing to give a contribution if requested, and comments are
 welcome.

 Thank you for the attention,

 kind regards,

 Simone

 R.version

    _
 platform   i386-pc-mingw32
 arch   i386
 os mingw32
 system i386, mingw32
 status
 major  2
 minor  8.1
 year   2008
 month  12
 day    22
 svn rev    47281
 language   R
 

Re: [Rd] quantile(), IQR() and median() for factors

2009-03-06 Thread Greg Snow
Yes I have discussed right continuous, left continous, etc. definitions for the 
median in numeric data.  I was just curious what the discussion was in texts 
that cover quantiles/medians of ordered categorical data in detail.

I do not expect Low.5 as computer output for the median (but Low.Medium does 
make sense in a way).  Back in my theory classes when we actually needed a firm 
definition I remember using the left continuous mainly (Low for the example), 
but I don't remember why we chose that over the right continuous version, 
probably just the teachers/books preference (I do remember it made things 
simpler than using the average of the middle 2 when n was even).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Simone Giannerini [mailto:sgianner...@gmail.com]
 Sent: Friday, March 06, 2009 2:08 PM
 To: Prof Brian Ripley
 Cc: Greg Snow; R-devel
 Subject: Re: [Rd] quantile(), IQR() and median() for factors
 
 Dear Greg,
 
 thank you for your comments,
 as Prof. Ripley pointed out, in the case of even sample size the
 median is not unique and is formed by the two central observations or
 a function of them, if that makes sense.
 
 
 
 Dear Prof. Ripley,
 
 thank you for your concern,
 
 may I notice that (in case of non-negative data) one can get the
 median from mad() with center=0,constant=1
 
 
  mad(1:10,center=0,constant=1)
 [1] 5.5
  mad(1:10,center=0,constant=1,high=TRUE)
 [1] 6
  mad(1:10,center=0,constant=1,low=TRUE)
 [1] 5
 
 so that it seems that part of the code of mad() might be a starting
 point, at least for median().
 I confirm my availability to work on the matter if requested.
 
 Kind regards,
 
 Simone
 
 
 On Fri, Mar 6, 2009 at 6:36 PM, Prof Brian Ripley
 rip...@stats.ox.ac.uk wrote:
  On Fri, 6 Mar 2009, Greg Snow wrote:
 
  I like the idea of median and friends working on ordered factors.
 Just a
  couple of thoughts on possible implementations.
 
  Adding extra checks and functionality will slow down the function.
 For a
  single evaluation on a given dataset this slowdown will not be
 noticeable,
  but inside of a simulation, bootstrap, or other high iteration
 technique, it
  could matter.  I would suggest creating a core function that does
 just the
  calculations (median, quantile, iqr) assuming that the data passed
 in is
  correct without doing any checks or anything fancy.  Then the user
 callable
  function (median et. al.) would do the checks dispatch to other
 functions
  for anything fancy, etc. then call the core function with the clean
 data.
   The common user would not really notice a difference, but someone
  programming a high iteration technique could clean the data
 themselves, then
  call the core function directly bypassing the checks/branches.
 
  Since median and quantile are already generic, adding a 'ordered'
 method
  would be zero cost to other uses.  And the factor check at the head
 of
  median.default could be replaced by median.factor if someone could
 show a
  convincing performance difference.
 
  Just out of curiosity (from someone who only learned from English
  (Americanized at that) and not Italian texts), what would the median
 of
  [Low, Low, Medium, High] be?
 
  I don't think it is 'the' median but 'a' median.  (Even English
 Wikipedia
  says the median is not unique for even numbers of inputs.)
 
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
  -Original Message-
  From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-
  project.org] On Behalf Of Simone Giannerini
  Sent: Thursday, March 05, 2009 4:49 PM
  To: R-devel
  Subject: [Rd] quantile(), IQR() and median() for factors
 
  Dear all,
 
  from the help page of quantile:
 
  x     numeric vectors whose sample quantiles are wanted. Missing
  values are ignored.
 
  from the help page of IQR:
 
  x     a numeric vector.
 
  as a matter of facts it seems that both quantile() and IQR() do not
  check for the presence of a numeric input.
  See the following:
 
  set.seed(11)
  x - rbinom(n=11,size=2,prob=.5)
  x - factor(x,ordered=TRUE)
  x
   [1] 1 0 1 0 0 2 0 1 2 0 0
  Levels: 0  1  2
 
  quantile(x)
 
    0%  25%  50%  75% 100%
     0 NA    0 NA    2
  Levels: 0  1  2
  Warning messages:
  1: In Ops.ordered((1 - h), qs[i]) :
    '*' is not meaningful for ordered factors
  2: In Ops.ordered(h, x[hi[i]]) : '*' is not meaningful for ordered
  factors
 
  IQR(x)
 
  [1] 1
 
  whereas median has the check:
 
  median(x)
 
  Error in median.default(x) : need numeric data
 
  I also take the opportunity to ask your comments on the following
  related subject:
 
  In my opinion it would be convenient that median() and the like
  (quantile(), IQR()) be implemented for ordered factors for which in
  fact
  they can be well defined. For instance, in this way functions like
  

[Rd] S4 objects for S3 methods

2009-03-06 Thread John Chambers
Some modifications have been committed for the r-devel version today 
that modify (essentially, correct a bug in) the communication of objects 
to an S3 method from an S4 class that extends the S3 class.


This is one of a sequence of changes designed to make S4 classes work 
more generally and consistently with S3 methods and classes.


In 2.8.0, support was provided for S4 classes that extend S3 classes, 
partly by making S3 method dispatch recognize the inheritance.


The catch was that the S3 method would get the S4 object.  Two problems 
with that:


1. The S3 method would fail if it tried to use the S3 class information 
directly, since the class attribute was the S4 class.


2. More seriously, if the method used the object, modified it and 
returned the result, it had a good chance of returning an invalid object 
seeming to come from the S4 class.


The modification to deal with this now delivers to the S3 method the 
inherited S3 object.  (This turned out to be somewhat harder than the 
original change, since it impacts several pieces of internal code.)  A 
revision of the function asS4() deals with similar concerns--see the 
documentation.


The change does not affect default methods.  It would be tempting to 
convert S4 objects for those, but some S3 generics attempt to deal with 
S4 objects, e.g., str().  A change to the primitives that dispatch 
methods is more plausible, but for the moment all that was added was 
more explicit error messages if a non-vector S4 object is supplied.


For more information see the section on inheriting from non-S4 classes 
in the documentation ?Classes.


It would be helpful if package maintainers would check this and previous 
changes by running their code against the r-devel version of R, before 
that becomes 2.9.0.  Please report any new errors (provided, of course, 
that the same code works with 2.8.1).


John

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Fix for foreign package segfault on Solaris 10 Intel

2009-03-06 Thread Jeff Long
Like a couple of other posters in the past year, I was seeing R 2.8.1 
segfault in the foreign package on my Solaris 10 Intel system:


   library(foreign)

  *** caught segfault ***
  address fe1d5c70, cause 'invalid permissions'

  Traceback:
   1: .C(spss_init, PACKAGE = foreign)
   2: fun(...)

This happened whether I built with gcc3, gcc4, or SunStudio 12.

Using pstack I found that the code was crashing in avl_create(). 
Using truss I found that identically named functions in the Solaris 
/lib/libavl.so.1 library were being used instead of the AVL functions 
provided in avl.c in the foreign package. To verify, I replaced all 
of the avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ 
and RAVL_ respectively. Once I made this change, loading the 
foreign package caused no further problems.


An alternative workaround was a hack involving symlinks and 
LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl 
functions are incompatible with the ones provided by the standard Sun 
library, this approach has other potential gotchas.


FYI.

Jeff

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel

2009-03-06 Thread Prof Brian Ripley
Can you show us the output you get from building foreign, and explain 
how it comes to be linked against libavl?  I get (SunStudio 12)


cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o 
Rdbfwrite.o SASxport.o avl.o dbfopen.o file-handle.o format.o init.o 
minitab.o pfm-read.o sfm-read.o spss.o stataread.o


and ldd library/foreign/libs/foreign.so reveals no dependencies (and 
the R binary is not linked against libavl either).


I can see that linking against libavl could cause problems, but have 
no idea why that might be happening.



On Fri, 6 Mar 2009, Jeff Long wrote:

Like a couple of other posters in the past year, I was seeing R 2.8.1 
segfault in the foreign package on my Solaris 10 Intel system:


  library(foreign)

 *** caught segfault ***
 address fe1d5c70, cause 'invalid permissions'

 Traceback:
  1: .C(spss_init, PACKAGE = foreign)
  2: fun(...)

This happened whether I built with gcc3, gcc4, or SunStudio 12.

Using pstack I found that the code was crashing in avl_create(). Using truss 
I found that identically named functions in the Solaris /lib/libavl.so.1 
library were being used instead of the AVL functions provided in avl.c in the 
foreign package. To verify, I replaced all of the avl_ and AVL_ patterns 
in foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I made this 
change, loading the foreign package caused no further problems.


An alternative workaround was a hack involving symlinks and LD_LIBRARY_PATH, 
but that was not satisfactory. Since the foreign avl functions are 
incompatible with the ones provided by the standard Sun library, this 
approach has other potential gotchas.


FYI.

Jeff

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] question

2009-03-06 Thread Mark.Bravington
[ivo welch wrote:]

 The syntax for returning multiple arguments does not strike me as
 particularly appealing.  would it not possible to allow syntax like:

   f= function() { return( rnorm(10), rnorm(20) ) }
   (a,d$b) = f()



FWIW, my own solution is to define a multi-assign operator:

'%-%' - function( a, b){
  # a must be of the form '{thing1;thing2;...}'
  a - as.list( substitute( a))[-1]
  e - sys.parent()
  stopifnot( length( b) == length( a))
  for( i in seq_along( a))
eval( call( '-', a[[ i]], b[[i]]), envir=e)
  NULL
}

Then I can write 

{a;d$b} %-% f()

Actually it should probably return b invisibly, so that it can be chained a la 
{c$e$f;g$h} %-% {a;d$b} %-% f()

I haven't checked it exhaustively but it has done the job OK for me.

The name '%-%' does already feature in one R package, can't remember which but 
it's to do with graph theory, so you might be better off calling it something 
else. I use the synonym %:=% which is closer to what I think R should have 
called its assignment operator in the first place ;)

HTH

Mark Bravington
CSIRO
Hobart
Australia




From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf 
Of Gabor Grothendieck [ggrothendi...@gmail.com]
Sent: 06 March 2009 09:25
To: ivo welch
Cc: r-devel@r-project.org
Subject: Re: [Rd] question

I posted this a few years ago (but found I never really had a
need for it):

http://tolstoy.newcastle.edu.au/R/help/04/06/1430.html



On Thu, Mar 5, 2009 at 9:22 AM, ivo welch ivo...@gmail.com wrote:
 dear R developers:  it is of course easy for a third party to make
 suggestions if this third party is both clueless and does not put in
 any work.  with these caveats, let me suggest something.

 The syntax for returning multiple arguments does not strike me as
 particularly appealing.  would it not possible to allow syntax like:

   f= function() { return( rnorm(10), rnorm(20) ) }
   (a,d$b) = f()

 this would just hide the list conversion and unconversion.  yes, I
 know how to accomplish this with lists, but it does not seem pretty or
 natural.

 regards,

 /ivo

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel

2009-03-06 Thread Jeff Long
I built it several times with a variety of flags and compilers. 
Here's what was used for the gcc3 build:


/opt/csw/gcc3/bin/gcc -std=gnu99 -G -L/opt/sfw/lib -L/opt/csw/lib 
-L/opt/local/lib -L/usr/apps/cdat32/NetCDF/lib -o foreign.so avl.o 
dbfopen.o file-handle.o format.o init.o minitab.o pfm-read.o 
Rdbfread.o Rdbfwrite.o R_systat.o SASxport.o sfm-read.o spss.o 
stataread.o   -L/admin/users/jwlong/R/src/R-2.8.1/lib -lR


On my system, the main R shared library shows a dependency on libavl, 
and hence so does foreign.so and the R binary:


 % ldd R
libR.so =   /opt/local/lib/R/lib/libR.so
libRblas.so =   /opt/local/lib/R/lib/libRblas.so
libc.so.1 = /lib/libc.so.1
libg2c.so.0 =   /opt/csw/lib/libg2c.so.0
libm.so.2 = /lib/libm.so.2
libintl.so.8 =  /opt/csw/lib/libintl.so.8
libreadline.so.4 =  /opt/csw/lib/libreadline.so.4
libncurses.so.5 =   /opt/csw/lib/libncurses.so.5
libnsl.so.1 =   /lib/libnsl.so.1
libsocket.so.1 =/lib/libsocket.so.1
libdl.so.1 =/lib/libdl.so.1
libiconv.so.2 = /opt/csw/lib/libiconv.so.2
libm.so.1 = /lib/libm.so.1
libgcc_s.so.1 = /opt/csw/lib/libgcc_s.so.1
libsec.so.1 =   /lib/libsec.so.1
libmp.so.2 =/lib/libmp.so.2
libmd.so.1 =/lib/libmd.so.1
libscf.so.1 =   /lib/libscf.so.1
libavl.so.1 =   /lib/libavl.so.1
libdoor.so.1 =  /lib/libdoor.so.1
libuutil.so.1 = /lib/libuutil.so.1
libgen.so.1 =   /lib/libgen.so.1


Looking through these various shared libs, it looks like 
/lib/libsec.so.1 is the one that pulls in libavl. And libintl is what 
pulls in libsec. And R itself pulls in libintl.


Jeff


==
At 11:20 PM + 3/6/09, Prof Brian Ripley wrote:
Can you show us the output you get from building foreign, and 
explain how it comes to be linked against libavl?  I get (SunStudio 
12)


cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o 
Rdbfwrite.o SASxport.o avl.o dbfopen.o file-handle.o format.o init.o 
minitab.o pfm-read.o sfm-read.o spss.o stataread.o


and ldd library/foreign/libs/foreign.so reveals no dependencies (and 
the R binary is not linked against libavl either).


I can see that linking against libavl could cause problems, but have 
no idea why that might be happening.



On Fri, 6 Mar 2009, Jeff Long wrote:

Like a couple of other posters in the past year, I was seeing R 
2.8.1 segfault in the foreign package on my Solaris 10 Intel system:


  library(foreign)

 *** caught segfault ***
 address fe1d5c70, cause 'invalid permissions'

 Traceback:
  1: .C(spss_init, PACKAGE = foreign)
  2: fun(...)

This happened whether I built with gcc3, gcc4, or SunStudio 12.

Using pstack I found that the code was crashing in avl_create(). 
Using truss I found that identically named functions in the Solaris 
/lib/libavl.so.1 library were being used instead of the AVL 
functions provided in avl.c in the foreign package. To verify, I 
replaced all of the avl_ and AVL_ patterns in 
foreign/src/*.[ch] with ravl_ and RAVL_ respectively. Once I 
made this change, loading the foreign package caused no further 
problems.


An alternative workaround was a hack involving symlinks and 
LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign 
avl functions are incompatible with the ones provided by the 
standard Sun library, this approach has other potential gotchas.


FYI.

Jeff

__
R-devel@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http:// www. stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fix for foreign package segfault on Solaris 10 Intel

2009-03-06 Thread Prof Brian Ripley

Interesting, thanks.  So

1) This is a shared R library build (not the default, and AFAIR no one 
reporting this has mentioned that -- not you, for example) and


2) You have a third-party libintl.

One solution would seem to be to ask R to use the libintl in the 
sources by (I think) --with-included-gettext .


I'll look into modifying foreign, but the combination of 1) and 2) 
could apply to alomost any library on the system and hence any symbol 
name in any package.


It's odd that you have -L/usr/apps/cdat32/NetCDF/lib in there, another 
potential cause for problems.


On Fri, 6 Mar 2009, Jeff Long wrote:

I built it several times with a variety of flags and compilers. Here's what 
was used for the gcc3 build:


/opt/csw/gcc3/bin/gcc -std=gnu99 -G -L/opt/sfw/lib -L/opt/csw/lib 
-L/opt/local/lib -L/usr/apps/cdat32/NetCDF/lib -o foreign.so avl.o dbfopen.o 
file-handle.o format.o init.o minitab.o pfm-read.o Rdbfread.o Rdbfwrite.o 
R_systat.o SASxport.o sfm-read.o spss.o stataread.o 
-L/admin/users/jwlong/R/src/R-2.8.1/lib -lR


On my system, the main R shared library shows a dependency on libavl, and 
hence so does foreign.so and the R binary:


% ldd R
   libR.so =   /opt/local/lib/R/lib/libR.so
   libRblas.so =   /opt/local/lib/R/lib/libRblas.so
   libc.so.1 = /lib/libc.so.1
   libg2c.so.0 =   /opt/csw/lib/libg2c.so.0
   libm.so.2 = /lib/libm.so.2
   libintl.so.8 =  /opt/csw/lib/libintl.so.8
   libreadline.so.4 =  /opt/csw/lib/libreadline.so.4
   libncurses.so.5 =   /opt/csw/lib/libncurses.so.5
   libnsl.so.1 =   /lib/libnsl.so.1
   libsocket.so.1 =/lib/libsocket.so.1
   libdl.so.1 =/lib/libdl.so.1
   libiconv.so.2 = /opt/csw/lib/libiconv.so.2
   libm.so.1 = /lib/libm.so.1
   libgcc_s.so.1 = /opt/csw/lib/libgcc_s.so.1
   libsec.so.1 =   /lib/libsec.so.1
   libmp.so.2 =/lib/libmp.so.2
   libmd.so.1 =/lib/libmd.so.1
   libscf.so.1 =   /lib/libscf.so.1
   libavl.so.1 =   /lib/libavl.so.1
   libdoor.so.1 =  /lib/libdoor.so.1
   libuutil.so.1 = /lib/libuutil.so.1
   libgen.so.1 =   /lib/libgen.so.1


Looking through these various shared libs, it looks like /lib/libsec.so.1 is 
the one that pulls in libavl. And libintl is what pulls in libsec. And R 
itself pulls in libintl.


Jeff


==
At 11:20 PM + 3/6/09, Prof Brian Ripley wrote:
Can you show us the output you get from building foreign, and explain how 
it comes to be linked against libavl?  I get (SunStudio 12)


cc -xc99 -G -L/opt/csw/lib -o foreign.so R_systat.o Rdbfread.o Rdbfwrite.o 
SASxport.o avl.o dbfopen.o file-handle.o format.o init.o minitab.o 
pfm-read.o sfm-read.o spss.o stataread.o


and ldd library/foreign/libs/foreign.so reveals no dependencies (and the R 
binary is not linked against libavl either).


I can see that linking against libavl could cause problems, but have no 
idea why that might be happening.



On Fri, 6 Mar 2009, Jeff Long wrote:

Like a couple of other posters in the past year, I was seeing R 2.8.1 
segfault in the foreign package on my Solaris 10 Intel system:


  library(foreign)

 *** caught segfault ***
 address fe1d5c70, cause 'invalid permissions'

 Traceback:
  1: .C(spss_init, PACKAGE = foreign)
  2: fun(...)

This happened whether I built with gcc3, gcc4, or SunStudio 12.

Using pstack I found that the code was crashing in avl_create(). Using 
truss I found that identically named functions in the Solaris 
/lib/libavl.so.1 library were being used instead of the AVL functions 
provided in avl.c in the foreign package. To verify, I replaced all of the 
avl_ and AVL_ patterns in foreign/src/*.[ch] with ravl_ and RAVL_ 
respectively. Once I made this change, loading the foreign package caused 
no further problems.


An alternative workaround was a hack involving symlinks and 
LD_LIBRARY_PATH, but that was not satisfactory. Since the foreign avl 
functions are incompatible with the ones provided by the standard Sun 
library, this approach has other potential gotchas.


FYI.

Jeff

__
R-devel@r-project.org mailing list
https:// stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http:// www. stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org