Re: [Rd] mean

2020-01-09 Thread Stephen Ellison
Note that in 

> > quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] : 
>  argument non numérique pour un opérateur binaire
the default quantile type (7) does not work for non-numerics.

Quantile types 1 and 3 work as expected:

> quantile(c("1","2","3"),p=.5, type=1)
50% 
"2" 
> quantile(c("1","2","3"),p=.5, type=3)
50% 
"2"


Steve E



***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel
Peter,

Thanks for the reply.

If that were the case, then should not the following be allowed to work with 
ordered factors?

> median(factor(c("1", "2", "3"), ordered = TRUE))
Error in median.default(factor(c("1", "2", "3"), ordered = TRUE)) : 
  need numeric data

At least on the surface, if you can lexically order a character vector:

> median(c("red", "blue", "green"))
[1] "green"

you can also order a factor, or ordered factor, and if the number of elements 
is odd, return a median value.

Regards,

Marc


> On Jan 9, 2020, at 10:46 AM, peter dalgaard  wrote:
> 
> I think median() behaves as designed: As long as the argument can be ordered, 
> the "middle observation" makes sense, except when the middle falls between 
> two categories, and you can't define and average of the two candidates for a 
> median.
> 
> The "sick man" would seem to be var(). Notice that it is also inconsistent 
> with cov():
> 
>> cov(c("1","2","3","4"),c("1","2","3","4") )
> Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) : 
>  is.numeric(x) || is.logical(x) is not TRUE
>> var(c("1","2","3","4"),c("1","2","3","4") )
> [1] 1.67
> 
> -pd
> 
> 
>> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel  
>> wrote:
>> 
>> Jean-Luc,
>> 
>> Please keep the communications on the list, for the benefit of others, now 
>> and in the future, via the list archive. I am adding r-devel back here.
>> 
>> I can't speak to the rationale in some of these cases. As I noted, it may be 
>> (is likely) due to differing authors over time, and there may have been 
>> relevant use cases at the time that the code was written, resulting in the 
>> various checks. Presumably, the additional checks were not incorporated into 
>> the other functions to enforce a level of consistency.
>> 
>> We will need to wait for someone from R Core to comment.
>> 
>> Regards,
>> 
>> Marc
>> 
>>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  
>>> wrote:
>>> 
>>> Ok, inconstencies.
>>> 
>>> The last test you wrote is a bit strange. I agree that it is useful to warn 
>>> about a computation that have no sense in the case of factors. But why 
>>> testing data;frames? If you go that way using random structures, you can 
>>> also try :
>>> 
>>>> median(list(1,2),list(3,4),list(4,5))
>>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) : 
>>> l'argument n'est pas interprétable comme une valeur logique
>>> De plus : Warning message:
>>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>>> return(x[FALSE][NA]) :
>>> la condition a une longueur > 1 et seul le premier élément est utilisé
>>> 
>>> giving a message which, despite of his length, doesn't really explain the 
>>> reason of the error.
>>> 
>>> Why not a test on arguments like?
>>> if (!is.numeric(x)) 
>>>stop("need numeric data")
>>> 
>>> 
>>> -Message d'origine-
>>> De : Marc Schwartz  
>>> Envoyé : jeudi 9 janvier 2020 14:19
>>> À : Lipatz Jean-Luc 
>>> Cc : R-Devel 
>>> Objet : Re: [Rd] mean
>>> 
>>> 
>>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  
>>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> Is there a reason for the following behaviour?
>>>>> mean(c("1","2","3"))
>>>> [1] NA
>>>> Warning message:
>>>> In mean.default(c("1", "2", "3")) :
>>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>>> 
>>>> But:
>>>>> var(c("1","2","3"))
>>>> [1] 1
>>>> 
>>>> And also:
>>>>> median(c("1","2","3"))
>>>> [1] "2"
>>>> 
>>>> But:
>>>>> quantile(c("1","2","3"),p=.5)
>>>> Error in (1 - h) * qs[i] : 
>>>> argument non numérique pour un opérateur binaire
>>>> 
>>>> It sounds like a lack of symet

Re: [Rd] mean

2020-01-09 Thread peter dalgaard
I think median() behaves as designed: As long as the argument can be ordered, 
the "middle observation" makes sense, except when the middle falls between two 
categories, and you can't define and average of the two candidates for a median.

The "sick man" would seem to be var(). Notice that it is also inconsistent with 
cov():

> cov(c("1","2","3","4"),c("1","2","3","4") )
Error in cov(c("1", "2", "3", "4"), c("1", "2", "3", "4")) : 
  is.numeric(x) || is.logical(x) is not TRUE
> var(c("1","2","3","4"),c("1","2","3","4") )
[1] 1.67

-pd


> On 9 Jan 2020, at 14:49 , Marc Schwartz via R-devel  
> wrote:
> 
> Jean-Luc,
> 
> Please keep the communications on the list, for the benefit of others, now 
> and in the future, via the list archive. I am adding r-devel back here.
> 
> I can't speak to the rationale in some of these cases. As I noted, it may be 
> (is likely) due to differing authors over time, and there may have been 
> relevant use cases at the time that the code was written, resulting in the 
> various checks. Presumably, the additional checks were not incorporated into 
> the other functions to enforce a level of consistency.
> 
> We will need to wait for someone from R Core to comment.
> 
> Regards,
> 
> Marc
> 
>> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  wrote:
>> 
>> Ok, inconstencies.
>> 
>> The last test you wrote is a bit strange. I agree that it is useful to warn 
>> about a computation that have no sense in the case of factors. But why 
>> testing data;frames? If you go that way using random structures, you can 
>> also try :
>> 
>>> median(list(1,2),list(3,4),list(4,5))
>> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
>> return(x[FALSE][NA]) : 
>> l'argument n'est pas interprétable comme une valeur logique
>> De plus : Warning message:
>> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) 
>> :
>> la condition a une longueur > 1 et seul le premier élément est utilisé
>> 
>> giving a message which, despite of his length, doesn't really explain the 
>> reason of the error.
>> 
>> Why not a test on arguments like?
>> if (!is.numeric(x)) 
>> stop("need numeric data")
>> 
>> 
>> -Message d'origine-
>> De : Marc Schwartz  
>> Envoyé : jeudi 9 janvier 2020 14:19
>> À : Lipatz Jean-Luc 
>> Cc : R-Devel 
>> Objet : Re: [Rd] mean
>> 
>> 
>>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> Is there a reason for the following behaviour?
>>>> mean(c("1","2","3"))
>>> [1] NA
>>> Warning message:
>>> In mean.default(c("1", "2", "3")) :
>>> l'argument n'est ni numérique, ni logique : renvoi de NA
>>> 
>>> But:
>>>> var(c("1","2","3"))
>>> [1] 1
>>> 
>>> And also:
>>>> median(c("1","2","3"))
>>> [1] "2"
>>> 
>>> But:
>>>> quantile(c("1","2","3"),p=.5)
>>> Error in (1 - h) * qs[i] : 
>>> argument non numérique pour un opérateur binaire
>>> 
>>> It sounds like a lack of symetry. 
>>> Best regards.
>>> 
>>> 
>>> Jean-Luc LIPATZ
>>> Insee - Direction générale
>>> Responsable de la coordination sur le développement de R et la mise en 
>>> oeuvre d'alternatives à SAS
>> 
>> 
>> Hi,
>> 
>> It would appear, whether by design or just inconsistent implementations, 
>> perhaps by different authors over time, that the checks for whether or not 
>> the input vector is numeric differ across the functions.
>> 
>> A further inconsistency is for median(), where:
>> 
>>> median(c("1", "2", "3", "4"))
>> [1] NA
>> Warning message:
>> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>> argument is not numeric or logical: returning NA
>> 
>> as a result of there being 4 elements, rather than 3, and the internal 
>> checks in the code, where in the case of the input vector having an even 
>> number of elements, mean() is used:
>> 
>>   if (n%%2L == 1L) 
>>   sort(x, partial = half)[half]
>>   else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
>> 
>> 
>> Similarly:
>> 
>>> median(factor(c("1", "2", "3")))
>> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
>> 
>> because the input vector is a factor, rather than character, and the initial 
>> check has:
>> 
>> if (is.factor(x) || is.data.frame(x)) 
>> stop("need numeric data")
>> 
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 
>> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel
Jean-Luc,

Please keep the communications on the list, for the benefit of others, now and 
in the future, via the list archive. I am adding r-devel back here.

I can't speak to the rationale in some of these cases. As I noted, it may be 
(is likely) due to differing authors over time, and there may have been 
relevant use cases at the time that the code was written, resulting in the 
various checks. Presumably, the additional checks were not incorporated into 
the other functions to enforce a level of consistency.

We will need to wait for someone from R Core to comment.

Regards,

Marc

> On Jan 9, 2020, at 8:34 AM, Lipatz Jean-Luc  wrote:
> 
> Ok, inconstencies.
> 
> The last test you wrote is a bit strange. I agree that it is useful to warn 
> about a computation that have no sense in the case of factors. But why 
> testing data;frames? If you go that way using random structures, you can also 
> try :
> 
>> median(list(1,2),list(3,4),list(4,5))
> Error in if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) 
> return(x[FALSE][NA]) : 
>  l'argument n'est pas interprétable comme une valeur logique
> De plus : Warning message:
> In if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) :
>  la condition a une longueur > 1 et seul le premier élément est utilisé
> 
> giving a message which, despite of his length, doesn't really explain the 
> reason of the error.
> 
> Why not a test on arguments like?
>  if (!is.numeric(x)) 
>  stop("need numeric data")
> 
> 
> -Message d'origine-
> De : Marc Schwartz  
> Envoyé : jeudi 9 janvier 2020 14:19
> À : Lipatz Jean-Luc 
> Cc : R-Devel 
> Objet : Re: [Rd] mean
> 
> 
>> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  wrote:
>> 
>> Hello,
>> 
>> Is there a reason for the following behaviour?
>>> mean(c("1","2","3"))
>> [1] NA
>> Warning message:
>> In mean.default(c("1", "2", "3")) :
>> l'argument n'est ni numérique, ni logique : renvoi de NA
>> 
>> But:
>>> var(c("1","2","3"))
>> [1] 1
>> 
>> And also:
>>> median(c("1","2","3"))
>> [1] "2"
>> 
>> But:
>>> quantile(c("1","2","3"),p=.5)
>> Error in (1 - h) * qs[i] : 
>> argument non numérique pour un opérateur binaire
>> 
>> It sounds like a lack of symetry. 
>> Best regards.
>> 
>> 
>> Jean-Luc LIPATZ
>> Insee - Direction générale
>> Responsable de la coordination sur le développement de R et la mise en 
>> oeuvre d'alternatives à SAS
> 
> 
> Hi,
> 
> It would appear, whether by design or just inconsistent implementations, 
> perhaps by different authors over time, that the checks for whether or not 
> the input vector is numeric differ across the functions.
> 
> A further inconsistency is for median(), where:
> 
>> median(c("1", "2", "3", "4"))
> [1] NA
> Warning message:
> In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
>  argument is not numeric or logical: returning NA
> 
> as a result of there being 4 elements, rather than 3, and the internal checks 
> in the code, where in the case of the input vector having an even number of 
> elements, mean() is used:
> 
>if (n%%2L == 1L) 
>sort(x, partial = half)[half]
>else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
> 
> 
> Similarly:
> 
>> median(factor(c("1", "2", "3")))
> Error in median.default(factor(c("1", "2", "3"))) : need numeric data
> 
> because the input vector is a factor, rather than character, and the initial 
> check has:
> 
>  if (is.factor(x) || is.data.frame(x)) 
>  stop("need numeric data")
> 
> 
> Regards,
> 
> Marc Schwartz
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean

2020-01-09 Thread Marc Schwartz via R-devel


> On Jan 9, 2020, at 7:40 AM, Lipatz Jean-Luc  wrote:
> 
> Hello,
> 
> Is there a reason for the following behaviour?
>> mean(c("1","2","3"))
> [1] NA
> Warning message:
> In mean.default(c("1", "2", "3")) :
>  l'argument n'est ni numérique, ni logique : renvoi de NA
> 
> But:
>> var(c("1","2","3"))
> [1] 1
> 
> And also:
>> median(c("1","2","3"))
> [1] "2"
> 
> But:
>> quantile(c("1","2","3"),p=.5)
> Error in (1 - h) * qs[i] : 
>  argument non numérique pour un opérateur binaire
> 
> It sounds like a lack of symetry. 
> Best regards.
> 
> 
> Jean-Luc LIPATZ
> Insee - Direction générale
> Responsable de la coordination sur le développement de R et la mise en oeuvre 
> d'alternatives à SAS


Hi,

It would appear, whether by design or just inconsistent implementations, 
perhaps by different authors over time, that the checks for whether or not the 
input vector is numeric differ across the functions.

A further inconsistency is for median(), where:

> median(c("1", "2", "3", "4"))
[1] NA
Warning message:
In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) :
  argument is not numeric or logical: returning NA

as a result of there being 4 elements, rather than 3, and the internal checks 
in the code, where in the case of the input vector having an even number of 
elements, mean() is used:

if (n%%2L == 1L) 
sort(x, partial = half)[half]
else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])


Similarly:

> median(factor(c("1", "2", "3")))
Error in median.default(factor(c("1", "2", "3"))) : need numeric data

because the input vector is a factor, rather than character, and the initial 
check has:

  if (is.factor(x) || is.data.frame(x)) 
  stop("need numeric data")


Regards,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean

2020-01-09 Thread Lipatz Jean-Luc
Hello,

Is there a reason for the following behaviour?
> mean(c("1","2","3"))
[1] NA
Warning message:
In mean.default(c("1", "2", "3")) :
  l'argument n'est ni numérique, ni logique : renvoi de NA

But:
> var(c("1","2","3"))
[1] 1

And also:
> median(c("1","2","3"))
[1] "2"

But:
> quantile(c("1","2","3"),p=.5)
Error in (1 - h) * qs[i] : 
  argument non numérique pour un opérateur binaire

It sounds like a lack of symetry. 
Best regards.


Jean-Luc LIPATZ
Insee - Direction générale
Responsable de la coordination sur le développement de R et la mise en oeuvre 
d'alternatives à SAS

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean(x) for ALTREP

2018-04-26 Thread Gabe Becker
Serguei,

The R 3.5.0 release includes the fundamental ALTREP framework but does not
include many 'hooks' within R's source code to make use of methods on the
ALTREP custom vector classes. I have implemented a fair number, including
for mean() to use the custom Sum method when available, in the ALTREP
branch but unfortunately we did not have time to test and port them to the
trunk in time for this release. The current plan, as I understand it, is
that we will continue to develop and test these, and other hooks, and then
when ready they will be ported into trunk/R-devel over the course this
current development cycle for inclusion in the next release of R.

My hope is that the end-user benefits of ALTREP will really show through
much more in future releases, but for now, things like mean will will
behave as they always have from a user perspective.

Best,
~G


On Thu, Apr 26, 2018 at 2:31 AM, Serguei Sokol 
wrote:

> Hi,
>
> By looking at a doc about ALTREP https://svn.r-project.org/R/br
> anches/ALTREP/ALTREP.html (by the way congratulations for that and for
> R-3.5.0 in general), I was a little bit surprised by the following example:
>
> > x <- 1:1e10
> > system.time(print(mean(x)))
> [1] 5e+09
>user  system elapsed
>  38.520   0.008  38.531
>
> Taking 38.520 s to calculate a mean value of an arithmetic sequence seemed
> a lot to me. It probably means that calculations are made by running into a
> for loop while in the case of arithmetic sequence a mean value can simply
> be calculated as (b+e)/2 where b and e are the begin and end value
> respectively. Is it planned to take benefit of ALTREP for functions like
> mean(), sum(), min(), max() and some others to avoid running a for loop
> wherever possible? It seems so natural to me but after all some
> implementation details preventing this can escape to me.
>
> Best,
> Serguei.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean(x) for ALTREP

2018-04-26 Thread Serguei Sokol

Hi,

By looking at a doc about ALTREP 
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html (by the way 
congratulations for that and for R-3.5.0 in general), I was a little bit 
surprised by the following example:


> x <- 1:1e10
> system.time(print(mean(x)))
[1] 5e+09
   user  system elapsed
 38.520   0.008  38.531

Taking 38.520 s to calculate a mean value of an arithmetic sequence 
seemed a lot to me. It probably means that calculations are made by 
running into a for loop while in the case of arithmetic sequence a mean 
value can simply be calculated as (b+e)/2 where b and e are the begin 
and end value respectively. Is it planned to take benefit of ALTREP for 
functions like mean(), sum(), min(), max() and some others to avoid 
running a for loop wherever possible? It seems so natural to me but 
after all some implementation details preventing this can escape to me.


Best,
Serguei.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

2017-04-01 Thread Hervé Pagès

On 03/31/2017 10:14 PM, Prof Brian Ripley wrote:

From ?NA

 Numerical computations using ‘NA’ will normally result in ‘NA’: a
 possible exception is where ‘NaN’ is also involved, in which case
 either might result.

and ?NaN

 Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
 which of those two is not guaranteed and may depend on the R
 platform (since compilers may re-order computations).

fortunes::fortune(14) applies (yet again).


The problem is that TFM often contradicts itself e.g. in ?prod:

 If ‘na.rm’ is ‘FALSE’ an ‘NA’ value in any of the arguments will
 cause a value of ‘NA’ to be returned, otherwise ‘NA’ values are
 ignored.

which is clearly not the case (at least for me):

  > x <- c(NaN, NA)
  > prod(x)
  [1] NaN

H.



On 01/04/2017 04:50, Henrik Bengtsson wrote:

In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
from source as well as for the sudo apt r-base build):


x <- c(NA, NaN)
mean(x)

[1] NA

mean(rev(x))

[1] NaN


rowMeans(matrix(x, nrow = 1, ncol = 2))

[1] NA

rowMeans(matrix(rev(x), nrow = 1, ncol = 2))

[1] NaN


.rowMeans(x, m = 1, n = 2)

[1] NA

.rowMeans(rev(x), m = 1, n = 2)

[1] NaN


.rowSums(x, m = 1, n = 2)

[1] NA

.rowSums(rev(x), m = 1, n = 2)

[1] NaN


rowSums(matrix(x, nrow = 1, ncol = 2))

[1] NA

rowSums(matrix(rev(x), nrow = 1, ncol = 2))

[1] NaN

I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
does not have this problem and returns NA in both cases (*).

For the same R version build from source on RHEL 6.6 system
(completely different architecture), I get the expected result (= NA)
for all of the above cases, e.g.


x <- c(NA, NaN)
mean(x)

[1] NA

mean(rev(x))

[1] NA
[...]

Before going insane trying to troubleshoot this, I have a vague memory
that this, or something related to this, has been discussed
previously, but I cannot locate it.

Is the above a bug in R, a FAQ, a build error, overzealous compiler
optimization, and / or ...?

Thanks,

Henrik





--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

2017-04-01 Thread Henrik Bengtsson
Although help("is.nan") says:

   "Computations involving NaN will return NaN or perhaps NA: ..."

it might not be obvious that this is also why one may get:

> mean(c(-Inf, +Inf, NA))
[1] NaN

> mean(c(-Inf, NA, +Inf))
[1] NA

This is because internally the intermediate sum +Inf + -Inf is NaN in
the first case.

May I propose the following patch to that help paragraph:

Index: src/library/base/man/is.finite.Rd
===
--- src/library/base/man/is.finite.Rd (revision 72462)
+++ src/library/base/man/is.finite.Rd (working copy)
@@ -78,6 +78,8 @@
   Computations involving \code{NaN} will return \code{NaN} or perhaps
   \code{\link{NA}}: which of those two is not guaranteed and may depend
   on the \R platform (since compilers may re-order computations).
+  This may also apply to computations involving both \code{-Inf} and
+  \code{+Inf} in cases where they produce an intermediate \code{NaN}.
 }
 \value{
   A logical vector of the same length as \code{x}: \code{dim},

/Henrik


On Fri, Mar 31, 2017 at 10:51 PM, Henrik Bengtsson
 wrote:
> On Fri, Mar 31, 2017 at 10:14 PM, Prof Brian Ripley
>  wrote:
>> From ?NA
>>
>>  Numerical computations using ‘NA’ will normally result in ‘NA’: a
>>  possible exception is where ‘NaN’ is also involved, in which case
>>  either might result.
>>
>> and ?NaN
>>
>>  Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
>>  which of those two is not guaranteed and may depend on the R
>>  platform (since compilers may re-order computations).
>>
>> fortunes::fortune(14) applies (yet again).
>
> Thanks; I'm often happy to have contributed to some of the fortune
> counters, but not so sure about this one.   What's even worse is that
> one of my own matrixStats NEWS has an entry go a few years back which
> mentions "... incorrectly assumed that the value of prod(c(NaN, NA))
> is uniquely defined.  However, as documented in help("is.nan"), it may
> be NA or NaN depending on R system/platform."  I guess the joke is on
> me - it's April 1st after all.
>
> But, technically one could test for ISNA(x) for each element before
> calculating the intermediate sum, but since that is a quite expensive
> test it is not done and sum += x is performed "as is" on NA and NaN
> (and -Inf and +Inf).  Is that correct?
>
> /Henrik
>
>>
>>
>> On 01/04/2017 04:50, Henrik Bengtsson wrote:
>>>
>>> In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
>>> from source as well as for the sudo apt r-base build):
>>>
 x <- c(NA, NaN)
 mean(x)
>>>
>>> [1] NA

 mean(rev(x))
>>>
>>> [1] NaN
>>>
 rowMeans(matrix(x, nrow = 1, ncol = 2))
>>>
>>> [1] NA

 rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
>>>
>>> [1] NaN
>>>
 .rowMeans(x, m = 1, n = 2)
>>>
>>> [1] NA

 .rowMeans(rev(x), m = 1, n = 2)
>>>
>>> [1] NaN
>>>
 .rowSums(x, m = 1, n = 2)
>>>
>>> [1] NA

 .rowSums(rev(x), m = 1, n = 2)
>>>
>>> [1] NaN
>>>
 rowSums(matrix(x, nrow = 1, ncol = 2))
>>>
>>> [1] NA

 rowSums(matrix(rev(x), nrow = 1, ncol = 2))
>>>
>>> [1] NaN
>>>
>>> I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
>>> does not have this problem and returns NA in both cases (*).
>>>
>>> For the same R version build from source on RHEL 6.6 system
>>> (completely different architecture), I get the expected result (= NA)
>>> for all of the above cases, e.g.
>>>
 x <- c(NA, NaN)
 mean(x)
>>>
>>> [1] NA

 mean(rev(x))
>>>
>>> [1] NA
>>> [...]
>>>
>>> Before going insane trying to troubleshoot this, I have a vague memory
>>> that this, or something related to this, has been discussed
>>> previously, but I cannot locate it.
>>>
>>> Is the above a bug in R, a FAQ, a build error, overzealous compiler
>>> optimization, and / or ...?
>>>
>>> Thanks,
>>>
>>> Henrik
>>
>>
>>
>> --
>> Brian D. Ripley,  rip...@stats.ox.ac.uk
>> Emeritus Professor of Applied Statistics, University of Oxford
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

2017-03-31 Thread Henrik Bengtsson
On Fri, Mar 31, 2017 at 10:14 PM, Prof Brian Ripley
 wrote:
> From ?NA
>
>  Numerical computations using ‘NA’ will normally result in ‘NA’: a
>  possible exception is where ‘NaN’ is also involved, in which case
>  either might result.
>
> and ?NaN
>
>  Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
>  which of those two is not guaranteed and may depend on the R
>  platform (since compilers may re-order computations).
>
> fortunes::fortune(14) applies (yet again).

Thanks; I'm often happy to have contributed to some of the fortune
counters, but not so sure about this one.   What's even worse is that
one of my own matrixStats NEWS has an entry go a few years back which
mentions "... incorrectly assumed that the value of prod(c(NaN, NA))
is uniquely defined.  However, as documented in help("is.nan"), it may
be NA or NaN depending on R system/platform."  I guess the joke is on
me - it's April 1st after all.

But, technically one could test for ISNA(x) for each element before
calculating the intermediate sum, but since that is a quite expensive
test it is not done and sum += x is performed "as is" on NA and NaN
(and -Inf and +Inf).  Is that correct?

/Henrik

>
>
> On 01/04/2017 04:50, Henrik Bengtsson wrote:
>>
>> In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
>> from source as well as for the sudo apt r-base build):
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>>
>> [1] NA
>>>
>>> mean(rev(x))
>>
>> [1] NaN
>>
>>> rowMeans(matrix(x, nrow = 1, ncol = 2))
>>
>> [1] NA
>>>
>>> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
>>
>> [1] NaN
>>
>>> .rowMeans(x, m = 1, n = 2)
>>
>> [1] NA
>>>
>>> .rowMeans(rev(x), m = 1, n = 2)
>>
>> [1] NaN
>>
>>> .rowSums(x, m = 1, n = 2)
>>
>> [1] NA
>>>
>>> .rowSums(rev(x), m = 1, n = 2)
>>
>> [1] NaN
>>
>>> rowSums(matrix(x, nrow = 1, ncol = 2))
>>
>> [1] NA
>>>
>>> rowSums(matrix(rev(x), nrow = 1, ncol = 2))
>>
>> [1] NaN
>>
>> I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
>> does not have this problem and returns NA in both cases (*).
>>
>> For the same R version build from source on RHEL 6.6 system
>> (completely different architecture), I get the expected result (= NA)
>> for all of the above cases, e.g.
>>
>>> x <- c(NA, NaN)
>>> mean(x)
>>
>> [1] NA
>>>
>>> mean(rev(x))
>>
>> [1] NA
>> [...]
>>
>> Before going insane trying to troubleshoot this, I have a vague memory
>> that this, or something related to this, has been discussed
>> previously, but I cannot locate it.
>>
>> Is the above a bug in R, a FAQ, a build error, overzealous compiler
>> optimization, and / or ...?
>>
>> Thanks,
>>
>> Henrik
>
>
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

2017-03-31 Thread Prof Brian Ripley

From ?NA

 Numerical computations using ‘NA’ will normally result in ‘NA’: a
 possible exception is where ‘NaN’ is also involved, in which case
 either might result.

and ?NaN

 Computations involving ‘NaN’ will return ‘NaN’ or perhaps ‘NA’:
 which of those two is not guaranteed and may depend on the R
 platform (since compilers may re-order computations).

fortunes::fortune(14) applies (yet again).

On 01/04/2017 04:50, Henrik Bengtsson wrote:

In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
from source as well as for the sudo apt r-base build):


x <- c(NA, NaN)
mean(x)

[1] NA

mean(rev(x))

[1] NaN


rowMeans(matrix(x, nrow = 1, ncol = 2))

[1] NA

rowMeans(matrix(rev(x), nrow = 1, ncol = 2))

[1] NaN


.rowMeans(x, m = 1, n = 2)

[1] NA

.rowMeans(rev(x), m = 1, n = 2)

[1] NaN


.rowSums(x, m = 1, n = 2)

[1] NA

.rowSums(rev(x), m = 1, n = 2)

[1] NaN


rowSums(matrix(x, nrow = 1, ncol = 2))

[1] NA

rowSums(matrix(rev(x), nrow = 1, ncol = 2))

[1] NaN

I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
does not have this problem and returns NA in both cases (*).

For the same R version build from source on RHEL 6.6 system
(completely different architecture), I get the expected result (= NA)
for all of the above cases, e.g.


x <- c(NA, NaN)
mean(x)

[1] NA

mean(rev(x))

[1] NA
[...]

Before going insane trying to troubleshoot this, I have a vague memory
that this, or something related to this, has been discussed
previously, but I cannot locate it.

Is the above a bug in R, a FAQ, a build error, overzealous compiler
optimization, and / or ...?

Thanks,

Henrik



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] mean(x) != mean(rev(x)) different with x <- c(NA, NaN) for some builds

2017-03-31 Thread Henrik Bengtsson
In R 3.3.3, I observe the following on Ubuntu 16.04 (when building
from source as well as for the sudo apt r-base build):

> x <- c(NA, NaN)
> mean(x)
[1] NA
> mean(rev(x))
[1] NaN

> rowMeans(matrix(x, nrow = 1, ncol = 2))
[1] NA
> rowMeans(matrix(rev(x), nrow = 1, ncol = 2))
[1] NaN

> .rowMeans(x, m = 1, n = 2)
[1] NA
> .rowMeans(rev(x), m = 1, n = 2)
[1] NaN

> .rowSums(x, m = 1, n = 2)
[1] NA
> .rowSums(rev(x), m = 1, n = 2)
[1] NaN

> rowSums(matrix(x, nrow = 1, ncol = 2))
[1] NA
> rowSums(matrix(rev(x), nrow = 1, ncol = 2))
[1] NaN

I'd expect NA to trump NaN in all cases (with na.rm = FALSE).  sum()
does not have this problem and returns NA in both cases (*).

For the same R version build from source on RHEL 6.6 system
(completely different architecture), I get the expected result (= NA)
for all of the above cases, e.g.

> x <- c(NA, NaN)
> mean(x)
[1] NA
> mean(rev(x))
[1] NA
[...]

Before going insane trying to troubleshoot this, I have a vague memory
that this, or something related to this, has been discussed
previously, but I cannot locate it.

Is the above a bug in R, a FAQ, a build error, overzealous compiler
optimization, and / or ...?

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean(trim=, c(NA,...), na.rm=FALSE) does not return NA

2010-03-18 Thread Prof Brian Ripley

On Tue, 16 Mar 2010, William Dunlap wrote:


Both of the following should return NA,
but do not in R version 2.11.0 Under
development (unstable) (2010-03-07 r51225)
on 32-bit Windows:


Nor in any version of R in the last several years (e.g. 2.1.0)


  mean(c(1,10,100,NA), trim=.1)
 Error in sort.int(x, partial = unique(c(lo, hi))) :
   index 4 outside bounds
  mean(c(1,10,100,NA), trim=.26)
 [1] 55

With na.rm=TRUE they give the correct results.


But the fix is easy and I've done so in R-devel, thank you.


(mean() would be so much simpler if we didn't
have to worry about the seldom-used trim=
argument.)


Only a little.  I think the drawback is more conceptual: a trimmed 
mean needs order-able data whereas 'mean' in its usual sense does not.



Bill Dunlap
Spotfire, TIBCO Software



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean(trim=, c(NA,...), na.rm=FALSE) does not return NA

2010-03-16 Thread William Dunlap
Both of the following should return NA,
but do not in R version 2.11.0 Under
development (unstable) (2010-03-07 r51225)
on 32-bit Windows:

   mean(c(1,10,100,NA), trim=.1)
  Error in sort.int(x, partial = unique(c(lo, hi))) : 
index 4 outside bounds
   mean(c(1,10,100,NA), trim=.26)
  [1] 55

With na.rm=TRUE they give the correct results.
(mean() would be so much simpler if we didn't
have to worry about the seldom-used trim=
argument.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)

2009-05-29 Thread zhengxin
Full_Name: 
Version: 2.9.0
OS: windows, linux
Submission from: (NULL) (128.231.21.125)


In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.

But it's not reverted yet.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 'mean' is not reverted in median() as NEWS says (PR#13731)

2009-05-29 Thread Peter Dalgaard

zheng...@mail.nih.gov wrote:
Full_Name: 
Version: 2.9.0

OS: windows, linux
Submission from: (NULL) (128.231.21.125)


In NEWS, it says median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.

But it's not reverted yet.


That text is not in the NEWS file for 2.9.0. And the NEWS file that it 
is in is not for 2.9.0, and does not list that change under CHANGES IN 
R VERSION 2.9.0.


--
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - (p.dalga...@biostat.ku.dk)  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean (PR#10864)

2008-02-28 Thread paulponcet
Full_Name: Paul PONCET
Version: 2.6.0
OS: Windows 2000
Submission from: (NULL) (83.137.240.218)


Function 'mean.default' calls function 'stats::median' if 'trim = 0.5'. In that
case the call should be 'stats::median(x, na.rm = na.rm)' instead of
'stats::median(x, na.rm = FALSE)'.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean mailing list moderator ..

2007-11-07 Thread Hin-Tak Leung
There is a simple solution to this kind of problem - for my
non-day-job-related software stuff, I usually subscribe under
my sourceforge address. Sourceforge's is simple re-direction service
so I actually cannot post from it; but I like incoming e-mails to go
through sourceforge for a double spam filter. So e-mails come in through
sourceforge but replied under my real address if I do, and it get held
occasionally in the past depending on the mailing list policies.

The solution I found is this: subscribe both addresses, but disabling 
delivery to the real-one. (this can be done by the user, no admin 
required). This way I can post from the real one, but receive 
twice-filtered mailing-list e-mails through an alias.

(For R-devel, I am receiving and posting from my day-job address,
if you are wondering...)

Martin Maechler wrote:
 Hi Jari,
 
 (and interested readers)
 
 JO == Jari Oksanen [EMAIL PROTECTED]
 on Wed, 07 Nov 2007 12:21:10 +0200 writes:
 
   [..]
   [...some very good stuff...]
   [..]
 
 JO Cheers, Jari Oksanen
 
 JO PS. Please Mr Moderator, don't treat me so mean (*): I've subscribed 
 to
 JO this group although you regularly reject my mail as coming from a
 JO non-member. 
 
 More than a year ago, I had changed R-devel policy to  
 
 1) subscribers can post freely
 2) everything else is on hold for moderator approval +)
 3) ``spam-suspicious e-mails'' are also put on hold.
 
 Now your problem is that you are subscribed under a different
 e-mail address than the one you are currently sending mail from
 (and also use in your sig. below).
 To the mailing list software (mailman) this is equivalent to a
 non-subscriber.
 
 +) the moderator can   **manually**  add non-subscriber
addresses to a list which is treated as allowed to post
and I could do this next time ...
but my general attitude is that r-devel subscribers should
make these things work...
 
 Best regards,
 Martin
 
 JO (*) an extract from a classic song Mr R jumped the rabbit.
 JO -- 
 JO Jari Oksanen [EMAIL PROTECTED]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean mailing list moderator ..

2007-11-07 Thread Martin Maechler
Hi Jari,

(and interested readers)

 JO == Jari Oksanen [EMAIL PROTECTED]
 on Wed, 07 Nov 2007 12:21:10 +0200 writes:

  [..]
  [...some very good stuff...]
  [..]

JO Cheers, Jari Oksanen

JO PS. Please Mr Moderator, don't treat me so mean (*): I've subscribed to
JO this group although you regularly reject my mail as coming from a
JO non-member. 

More than a year ago, I had changed R-devel policy to  

1) subscribers can post freely
2) everything else is on hold for moderator approval +)
3) ``spam-suspicious e-mails'' are also put on hold.

Now your problem is that you are subscribed under a different
e-mail address than the one you are currently sending mail from
(and also use in your sig. below).
To the mailing list software (mailman) this is equivalent to a
non-subscriber.

+) the moderator can   **manually**  add non-subscriber
   addresses to a list which is treated as allowed to post
   and I could do this next time ...
   but my general attitude is that r-devel subscribers should
   make these things work...

Best regards,
Martin

JO (*) an extract from a classic song Mr R jumped the rabbit.
JO -- 
JO Jari Oksanen [EMAIL PROTECTED]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ?mean

2007-01-26 Thread Berwin A Turlach
G'day Gabor,

On Thu, 25 Jan 2007 09:53:49 -0500
Gabor Grothendieck [EMAIL PROTECTED] wrote:

 The help page for mean does not say what happens when one
 applies mean to a matrix.

Well, not directly.  :-)

But the help page of mean says that one of the arguments is:

   x: An R object.  Currently there are methods for numeric data
  frames, numeric vectors and dates.  A complex vector is
  allowed for 'trim = 0', only.

And the `Value' section states:
 
 For a data frame, a named vector with the appropriate method being
 applied column by column.

 If 'trim' is zero (the default), the arithmetic mean of the values
 in 'x' is computed, as a numeric or complex vector of length one. 
 If any argument is not logical (coerced to numeric), integer,
 numeric or complex, 'NA' is returned, with a warning.

Since a matrix is a vector with a dimension attribute, and not a data
frame, one can deduce that the second paragraph describes the return
value for `mean(x)' when x is a matrix.

As I always tell my students, reading R help pages is a bit of an
art. :)

 mean and sd work in an inconsistent way on a matrix so that should at
 least be documented. 

Agreed.  But it is documented in the help page of sd, which clearly
states:

 [] If 'x' is a matrix or a data frame, a vector
 of the standard deviation of the columns is returned.

I guess you also want to have it documented in the mean help page?  

But then, should `var' also be mentioned in the mean help page?  This
command also work in an a different and inconsistent manner to mean on
matrices.

And, of course, there are other subtle inconsistencies in the language
used in these help pages.  Note that the mean help page talks about
numeric data frames while the help pages of `var' and `se' talk about
data frames only, though all components of the data frame have to be
numeric, of course.

 Also there should be a See Also to colMeans since that provides the
 missing column-wise analog to sd.

That's probably a good idea.  What would you suggest should be
mentioned to provide the column-wise analog of `var'?

Cheers,

Berwin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] ?mean

2007-01-25 Thread Gabor Grothendieck
The help page for mean does not say what happens when one
applies mean to a matrix.

mean and sd work in an inconsistent way on a matrix
so that should at least be documented.

Also there should be a See Also to colMeans since that
provides the missing column-wise analog to sd.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ?mean

2007-01-25 Thread Martin Maechler
 Gabor == Gabor Grothendieck [EMAIL PROTECTED]
 on Thu, 25 Jan 2007 09:53:49 -0500 writes:

Gabor The help page for mean does not say what happens when one
Gabor applies mean to a matrix.

Gabor mean and sd work in an inconsistent way on a matrix
Gabor so that should at least be documented.

You are right (though I think this *was* documented at some
point in time).

As a matter of fact, I hate the the inconsistencies you've
been mentioning, and I think is very wrong from an S-pedagogical
point of view both 
thatsd(mat)  :== apply(mat, 2, sd)
and   mean(dfr)  :== apply(dfr, 2, mean)
  
and it leads just to wrong ``analogy conclusions'' by useRs.

I'd vote for deprecating these ``builtin conveniences''
in order to gain consistency and clarity...

Though I haven't checked how many CRAN + Bioconductor packages
would break if we'd disactivate these two mis-features ...

Martin

Gabor Also there should be a See Also to colMeans since that
Gabor provides the missing column-wise analog to sd.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ?mean

2007-01-25 Thread Gabor Grothendieck
Good point.  Perhaps what is needed is a Note clarifying all this in ?mean
(unless the software itself is reworked as Martin has discussed).

Regarding var(x), one could use sd(x)^2.

On 1/25/07, Berwin A Turlach [EMAIL PROTECTED] wrote:
 G'day Gabor,

 On Thu, 25 Jan 2007 09:53:49 -0500
 Gabor Grothendieck [EMAIL PROTECTED] wrote:

  The help page for mean does not say what happens when one
  applies mean to a matrix.

 Well, not directly.  :-)

 But the help page of mean says that one of the arguments is:

   x: An R object.  Currently there are methods for numeric data
  frames, numeric vectors and dates.  A complex vector is
  allowed for 'trim = 0', only.

 And the `Value' section states:

 For a data frame, a named vector with the appropriate method being
 applied column by column.

 If 'trim' is zero (the default), the arithmetic mean of the values
 in 'x' is computed, as a numeric or complex vector of length one.
 If any argument is not logical (coerced to numeric), integer,
 numeric or complex, 'NA' is returned, with a warning.

 Since a matrix is a vector with a dimension attribute, and not a data
 frame, one can deduce that the second paragraph describes the return
 value for `mean(x)' when x is a matrix.

 As I always tell my students, reading R help pages is a bit of an
 art. :)

  mean and sd work in an inconsistent way on a matrix so that should at
  least be documented.

 Agreed.  But it is documented in the help page of sd, which clearly
 states:

 [] If 'x' is a matrix or a data frame, a vector
 of the standard deviation of the columns is returned.

 I guess you also want to have it documented in the mean help page?

 But then, should `var' also be mentioned in the mean help page?  This
 command also work in an a different and inconsistent manner to mean on
 matrices.

 And, of course, there are other subtle inconsistencies in the language
 used in these help pages.  Note that the mean help page talks about
 numeric data frames while the help pages of `var' and `se' talk about
 data frames only, though all components of the data frame have to be
 numeric, of course.

  Also there should be a See Also to colMeans since that provides the
  missing column-wise analog to sd.

 That's probably a good idea.  What would you suggest should be
 mentioned to provide the column-wise analog of `var'?

 Cheers,

Berwin


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean relative differences from all.equal() (PR#9276)

2006-10-04 Thread bchristo
Full_Name: Brad Christoffersen
Version: 2.3.1
OS: Windows XP
Submission from: (NULL) (128.196.193.132)


Why is the difference between two numbers so different from the mean relative
difference output from the all.equal() function?  Is this an artifact of the
way R stores numerics?  I could not find this problem as I searched through the
submitted bugs. But I am brand new to R so I apologize if there is something
obvious I'm missing here.

rm(list=ls(all=TRUE))  ## Remove all objects that could hinder w/ consistent
output
a - 204
b - 203.9792
all.equal(a,b)
[1] Mean relative  difference: 0.0001019608
a - b
[1] 0.0208

-- version -
platform   i386-pc-mingw32   
arch   i386  
os mingw32   
system i386, mingw32 
status   
major  2 
minor  3.1   
year   2006  
month  06
day01
svn rev38247 
language   R 
version.string Version 2.3.1 (2006-06-01)

Thanks,
Brad

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean relative differences from all.equal() (PR#9276)

2006-10-04 Thread MSchwartz
On Thu, 2006-10-05 at 03:10 +0200, [EMAIL PROTECTED] wrote:
 Full_Name: Brad Christoffersen
 Version: 2.3.1
 OS: Windows XP
 Submission from: (NULL) (128.196.193.132)
 
 
 Why is the difference between two numbers so different from the mean relative
 difference output from the all.equal() function?  Is this an artifact of the
 way R stores numerics?  I could not find this problem as I searched through 
 the
 submitted bugs. But I am brand new to R so I apologize if there is something
 obvious I'm missing here.
 
 rm(list=ls(all=TRUE))  ## Remove all objects that could hinder w/ consistent
 output
 a - 204
 b - 203.9792
 all.equal(a,b)
 [1] Mean relative  difference: 0.0001019608
 a - b
 [1] 0.0208

Read the Details section of ?all.equal, which states:

Numerical comparisons for scale = NULL (the default) are done by first
computing the mean absolute difference of the two numerical vectors. If
this is smaller than tolerance or not finite, absolute differences are
used, otherwise relative differences scaled by the mean absolute
difference.

If scale is positive, absolute comparisons are made after scaling
(dividing) by scale


Thus on R version 2.4.0 (2006-10-03):

 all.equal(a, b, scale = 1)
[1] Mean scaled  difference: 0.0208


Please do not report doubts about behavior as bugs.  Simply post a query
on r-help first. If it is a bug, somebody will confirm it and you can
then report it as such.

BTW, time to upgrade...Go Wildcats!

HTH,

Marc Schwartz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean relative differences from all.equal() (PR#9276)

2006-10-04 Thread MSchwartz
On Wed, 2006-10-04 at 20:22 -0500, Marc Schwartz wrote:
 On Thu, 2006-10-05 at 03:10 +0200, [EMAIL PROTECTED] wrote:
  Full_Name: Brad Christoffersen
  Version: 2.3.1
  OS: Windows XP
  Submission from: (NULL) (128.196.193.132)
  
  
  Why is the difference between two numbers so different from the mean 
  relative
  difference output from the all.equal() function?  Is this an artifact of 
  the
  way R stores numerics?  I could not find this problem as I searched through 
  the
  submitted bugs. But I am brand new to R so I apologize if there is something
  obvious I'm missing here.
  
  rm(list=ls(all=TRUE))  ## Remove all objects that could hinder w/ consistent
  output
  a - 204
  b - 203.9792
  all.equal(a,b)
  [1] Mean relative  difference: 0.0001019608
  a - b
  [1] 0.0208
 
 Read the Details section of ?all.equal, which states:
 
 Numerical comparisons for scale = NULL (the default) are done by first
 computing the mean absolute difference of the two numerical vectors. If
 this is smaller than tolerance or not finite, absolute differences are
 used, otherwise relative differences scaled by the mean absolute
 difference.
 
 If scale is positive, absolute comparisons are made after scaling
 (dividing) by scale
 
 
 Thus on R version 2.4.0 (2006-10-03):
 
  all.equal(a, b, scale = 1)
 [1] Mean scaled  difference: 0.0208
 
 
 Please do not report doubts about behavior as bugs.  Simply post a query
 on r-help first. If it is a bug, somebody will confirm it and you can
 then report it as such.
 
 BTW, time to upgrade...Go Wildcats!
 
 HTH,
 
 Marc Schwartz

[OFFLIST and PRIVATE]

Brad,

A couple of comments.

First, welcome to R. I hope that you enjoy it and find it of value.

If you are not used to open source software and communities (ie. Linux,
etc.), you will find that this community, unlike commercial paid support
forums, tends to be direct with respect to comments. Don't take it
personally.

Be aware that nobody is getting paid to support R. It is developed and
supported on a voluntary basis by a large body of folks, mainly those
known as R Core. Some of them have quite literally risked their
academic careers and livelihood to facilitate R's existence.

You will, over time, get a flavor for the nature of the community and
the interchange that takes place. As a result of the voluntary nature of
the community, there is an a priori expectation that you will have put
forth reasonable efforts to avail yourself of the various support
resources before posting. Especially in the case of a bug report, as a
member of R Core has to manually manage the handling and resolution of
bug reports.

A good place to start is to review the R Posting Guide:

http://www.r-project.org/posting-guide.html

which covers many of these issues and how to go about getting support
via the various sources provided.

That all being said, you will find that R's support mechanisms and
resources are second to none and I would challenge any commercial
software vendor to provide a comparable level of support and expertise.

With respect to your specific question above and how the result is
obtained:

 (a - b) / a
[1] 0.0001019608

Here, 'a' is used as the scaling factor, since you only passed single
values. If these were 'vectors' of values, the scaling factor would be
impacted accordingly.

As a result of R's open source nature, you have access to all of the
source code that is R. You can download the source tarball (archive)
from one of the CRAN mirrors, if you so desire.

In this case, the actual function that is used is called
all.equal.numeric(). This is a consequence of how R uses 'dispatch
methods' after a call to a 'generic' function, such as all.equal(). If
you are not familiar with these terms, the available R documentation is
a good place to start, if you should decide to pursue moving into that
level of detail. If you have experience in other programming languages,
this may be second nature already.

In many cases, R's functions are written in R itself. Others are written
in FORTRAN and/or C that is compiled and linked to R via various calling
mechanisms. Since R is an interpreted language, you can have easy access
to many of the functions within the R console.

Thus, at the R command prompt, you can type:

 all.equal.numeric

[Note without the parens]

which will then display a representation of the function's source code,
enabling you to review how the function works. If you desire to become a
better R user/programmer, this approach provides a reasonable way to see
how functions are coded and to investigate algorithms and techniques.

I hope that the above is helpful.

Best regards,

Marc

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] mean relative differences from all.equal() (PR#9276)

2006-10-04 Thread Marc Schwartz
On Wed, 2006-10-04 at 21:57 -0500, Marc Schwartz wrote:
 On Wed, 2006-10-04 at 20:22 -0500, Marc Schwartz wrote:
  On Thu, 2006-10-05 at 03:10 +0200, [EMAIL PROTECTED] wrote:
   Full_Name: Brad Christoffersen
   Version: 2.3.1
   OS: Windows XP
   Submission from: (NULL) (128.196.193.132)
   
   
   Why is the difference between two numbers so different from the mean 
   relative
   difference output from the all.equal() function?  Is this an artifact of 
   the
   way R stores numerics?  I could not find this problem as I searched 
   through the
   submitted bugs. But I am brand new to R so I apologize if there is 
   something
   obvious I'm missing here.
   
   rm(list=ls(all=TRUE))  ## Remove all objects that could hinder w/ 
   consistent
   output
   a - 204
   b - 203.9792
   all.equal(a,b)
   [1] Mean relative  difference: 0.0001019608
   a - b
   [1] 0.0208
  
  Read the Details section of ?all.equal, which states:
  
  Numerical comparisons for scale = NULL (the default) are done by first
  computing the mean absolute difference of the two numerical vectors. If
  this is smaller than tolerance or not finite, absolute differences are
  used, otherwise relative differences scaled by the mean absolute
  difference.
  
  If scale is positive, absolute comparisons are made after scaling
  (dividing) by scale
  
  
  Thus on R version 2.4.0 (2006-10-03):
  
   all.equal(a, b, scale = 1)
  [1] Mean scaled  difference: 0.0208
  
  
  Please do not report doubts about behavior as bugs.  Simply post a query
  on r-help first. If it is a bug, somebody will confirm it and you can
  then report it as such.
  
  BTW, time to upgrade...Go Wildcats!
  
  HTH,
  
  Marc Schwartz
 
 [OFFLIST and PRIVATE]
 
 Brad,
 
 A couple of comments.
 
 First, welcome to R. I hope that you enjoy it and find it of value.
 
 If you are not used to open source software and communities (ie. Linux,
 etc.), you will find that this community, unlike commercial paid support
 forums, tends to be direct with respect to comments. Don't take it
 personally.
 
 Be aware that nobody is getting paid to support R. It is developed and
 supported on a voluntary basis by a large body of folks, mainly those
 known as R Core. Some of them have quite literally risked their
 academic careers and livelihood to facilitate R's existence.
 
 You will, over time, get a flavor for the nature of the community and
 the interchange that takes place. As a result of the voluntary nature of
 the community, there is an a priori expectation that you will have put
 forth reasonable efforts to avail yourself of the various support
 resources before posting. Especially in the case of a bug report, as a
 member of R Core has to manually manage the handling and resolution of
 bug reports.
 
 A good place to start is to review the R Posting Guide:
 
 http://www.r-project.org/posting-guide.html
 
 which covers many of these issues and how to go about getting support
 via the various sources provided.
 
 That all being said, you will find that R's support mechanisms and
 resources are second to none and I would challenge any commercial
 software vendor to provide a comparable level of support and expertise.
 
 With respect to your specific question above and how the result is
 obtained:
 
  (a - b) / a
 [1] 0.0001019608
 
 Here, 'a' is used as the scaling factor, since you only passed single
 values. If these were 'vectors' of values, the scaling factor would be
 impacted accordingly.
 
 As a result of R's open source nature, you have access to all of the
 source code that is R. You can download the source tarball (archive)
 from one of the CRAN mirrors, if you so desire.
 
 In this case, the actual function that is used is called
 all.equal.numeric(). This is a consequence of how R uses 'dispatch
 methods' after a call to a 'generic' function, such as all.equal(). If
 you are not familiar with these terms, the available R documentation is
 a good place to start, if you should decide to pursue moving into that
 level of detail. If you have experience in other programming languages,
 this may be second nature already.
 
 In many cases, R's functions are written in R itself. Others are written
 in FORTRAN and/or C that is compiled and linked to R via various calling
 mechanisms. Since R is an interpreted language, you can have easy access
 to many of the functions within the R console.
 
 Thus, at the R command prompt, you can type:
 
  all.equal.numeric
 
 [Note without the parens]
 
 which will then display a representation of the function's source code,
 enabling you to review how the function works. If you desire to become a
 better R user/programmer, this approach provides a reasonable way to see
 how functions are coded and to investigate algorithms and techniques.
 
 I hope that the above is helpful.
 
 Best regards,
 
 Marc

My most sincere and public apologies to Brad. The reply message above
was mistakenly copied to the list.

Brad I am sorry.

Marc Schwartz


[Rd] mean(NA) returns -(1+.Machine$integer.max) (PR#9097)

2006-07-25 Thread btyner
Full_Name: Benjamin Tyner
Version: 2.3.0
OS: linux-gnu (debian)
Submission from: (NULL) (71.98.75.54)


 mean(NA)
returns -2147483648 on my system, which is -(1+.Machine$integer.max) 

 sessionInfo()
Version 2.3.0 (2006-04-24)
i686-pc-linux-gnu

attached base packages:
[1] methods   stats graphics  grDevices utils datasets
[7] base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] mean of complex vector (PR#8842)

2006-05-08 Thread john . peters
Full_Name: John Peters
Version: 2.3.0
OS: Windows 2000, xp
Submission from: (NULL) (220.233.20.203)


In R2.3.0 on Windows 2000 and xp

 mean(c(1i))
[1] 0+2i
 mean(c(1i,1i))
[1] 0+3i
 mean(c(1i,1i,1i))
[1] 0+4i

OK in R2.2.1

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel