Re: [Rd] Precision of function mean,bug?
Sorry, posting back to the list. Thank you all. Morgan On Thu, 21 May 2020, 16:33 Henrik Bengtsson, wrote: > Hi. > > Good point and a good example. Feel free to post to the list. The purpose > of my reply wasn't to take away Peter's point but to emphasize that > base::mean() does a two-pass scan over the elements too lower the impact of > addition of values with widely different values (classical problem in > numerical analysis). But I can see how it may look like that. > > Cheers, > > Henrik > > > On Thu, May 21, 2020, 03:21 Morgan Morgan > wrote: > >> Thank you Henrik for the feedback. >> Note that for idx=4 and refine = TRUE, your equality b==c is FALSE. I >> think that as Peter said == can't be trusted with FP. >> His example is good. Here is an even more shocking one. >> a=0.786546798 >> b=a+ 1e6 -1e6 >> a==b >> # [1] FALSE >> >> Best regards >> Morgan Jacob >> >> On Wed, 20 May 2020, 20:18 Henrik Bengtsson, >> wrote: >> >>> On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel >>> wrote: >>> > >>> > > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard < >>> pda...@gmail.com> wrote: >>> > > >>> > > Expected, see FAQ 7.31. >>> > > >>> > > You just can't trust == on FP operations. Notice also >>> > >>> > Additionally, since you're implementing a "mean" function you are >>> testing >>> > against R's mean, you might want to consider that R uses a two-pass >>> > calculation[1] to reduce floating point precision error. >>> >>> This one is important. >>> >>> FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to >>> calculate mean with and without this two-pass calculation; >>> >>> > a <- c(x[idx],y[idx],z[idx]) / 3 >>> > b <- mean(c(x[idx],y[idx],z[idx])) >>> > b == a >>> [1] FALSE >>> > b - a >>> [1] 2.220446e-16 >>> >>> > c <- matrixStats::mean2(c(x[idx],y[idx],z[idx])) ## default to >>> refine=TRUE >>> > b == c >>> [1] TRUE >>> > b - c >>> [1] 0 >>> >>> > d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE) >>> > a == d >>> [1] TRUE >>> > a - d >>> [1] 0 >>> > c == d >>> [1] FALSE >>> > c - d >>> [1] 2.220446e-16 >>> >>> Not surprisingly, the two-pass higher-precision version (refine=TRUE) >>> takes roughly twice as long as the one-pass quick version >>> (refine=FALSE). >>> >>> /Henrik >>> >>> > >>> > Best, >>> > >>> > Brodie. >>> > >>> > [1] >>> https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482 >>> > >>> > > > a2=(z[idx]+x[idx]+y[idx])/3 >>> > > > a2==a >>> > > [1] FALSE >>> > > > a2==b >>> > > [1] TRUE >>> > > >>> > > -pd >>> > > >>> > > > On 20 May 2020, at 12:40 , Morgan Morgan < >>> morgan.email...@gmail.com> wrote: >>> > > > >>> > > > Hello R-dev, >>> > > > >>> > > > Yesterday, while I was testing the newly implemented function >>> pmean in >>> > > > package kit, I noticed a mismatch in the output of the below R >>> expressions. >>> > > > >>> > > > set.seed(123) >>> > > > n=1e3L >>> > > > idx=5 >>> > > > x=rnorm(n) >>> > > > y=rnorm(n) >>> > > > z=rnorm(n) >>> > > > a=(x[idx]+y[idx]+z[idx])/3 >>> > > > b=mean(c(x[idx],y[idx],z[idx])) >>> > > > a==b >>> > > > # [1] FALSE >>> > > > >>> > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and >>> many >>> > > > others the difference is small but still. >>> > > > Is that expected or is it a bug? >>> > >>> > __ >>> > R-devel@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Precision of function mean,bug?
On Wed, May 20, 2020 at 11:10 AM brodie gaslam via R-devel wrote: > > > On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard > wrote: > > > > Expected, see FAQ 7.31. > > > > You just can't trust == on FP operations. Notice also > > Additionally, since you're implementing a "mean" function you are testing > against R's mean, you might want to consider that R uses a two-pass > calculation[1] to reduce floating point precision error. This one is important. FWIW, matrixStats::mean2() provides argument refine=TRUE/FALSE to calculate mean with and without this two-pass calculation; > a <- c(x[idx],y[idx],z[idx]) / 3 > b <- mean(c(x[idx],y[idx],z[idx])) > b == a [1] FALSE > b - a [1] 2.220446e-16 > c <- matrixStats::mean2(c(x[idx],y[idx],z[idx])) ## default to refine=TRUE > b == c [1] TRUE > b - c [1] 0 > d <- matrixStats::mean2(c(x[idx],y[idx],z[idx]), refine=FALSE) > a == d [1] TRUE > a - d [1] 0 > c == d [1] FALSE > c - d [1] 2.220446e-16 Not surprisingly, the two-pass higher-precision version (refine=TRUE) takes roughly twice as long as the one-pass quick version (refine=FALSE). /Henrik > > Best, > > Brodie. > > [1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482 > > > > a2=(z[idx]+x[idx]+y[idx])/3 > > > a2==a > > [1] FALSE > > > a2==b > > [1] TRUE > > > > -pd > > > > > On 20 May 2020, at 12:40 , Morgan Morgan > > > wrote: > > > > > > Hello R-dev, > > > > > > Yesterday, while I was testing the newly implemented function pmean in > > > package kit, I noticed a mismatch in the output of the below R > > > expressions. > > > > > > set.seed(123) > > > n=1e3L > > > idx=5 > > > x=rnorm(n) > > > y=rnorm(n) > > > z=rnorm(n) > > > a=(x[idx]+y[idx]+z[idx])/3 > > > b=mean(c(x[idx],y[idx],z[idx])) > > > a==b > > > # [1] FALSE > > > > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many > > > others the difference is small but still. > > > Is that expected or is it a bug? > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Precision of function mean,bug?
> On Wednesday, May 20, 2020, 7:00:09 AM EDT, peter dalgaard > wrote: > > Expected, see FAQ 7.31. > > You just can't trust == on FP operations. Notice also Additionally, since you're implementing a "mean" function you are testing against R's mean, you might want to consider that R uses a two-pass calculation[1] to reduce floating point precision error. Best, Brodie. [1] https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/summary.c#L482 > > a2=(z[idx]+x[idx]+y[idx])/3 > > a2==a > [1] FALSE > > a2==b > [1] TRUE > > -pd > > > On 20 May 2020, at 12:40 , Morgan Morgan wrote: > > > > Hello R-dev, > > > > Yesterday, while I was testing the newly implemented function pmean in > > package kit, I noticed a mismatch in the output of the below R expressions. > > > > set.seed(123) > > n=1e3L > > idx=5 > > x=rnorm(n) > > y=rnorm(n) > > z=rnorm(n) > > a=(x[idx]+y[idx]+z[idx])/3 > > b=mean(c(x[idx],y[idx],z[idx])) > > a==b > > # [1] FALSE > > > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many > > others the difference is small but still. > > Is that expected or is it a bug? __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Precision of function mean,bug?
Expected, see FAQ 7.31. You just can't trust == on FP operations. Notice also > a2=(z[idx]+x[idx]+y[idx])/3 > a2==a [1] FALSE > a2==b [1] TRUE -pd > On 20 May 2020, at 12:40 , Morgan Morgan wrote: > > Hello R-dev, > > Yesterday, while I was testing the newly implemented function pmean in > package kit, I noticed a mismatch in the output of the below R expressions. > > set.seed(123) > n=1e3L > idx=5 > x=rnorm(n) > y=rnorm(n) > z=rnorm(n) > a=(x[idx]+y[idx]+z[idx])/3 > b=mean(c(x[idx],y[idx],z[idx])) > a==b > # [1] FALSE > > For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many > others the difference is small but still. > Is that expected or is it a bug? > > Thank you > Best Regards > Morgan Jacob > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Precision of function mean,bug?
Hello R-dev, Yesterday, while I was testing the newly implemented function pmean in package kit, I noticed a mismatch in the output of the below R expressions. set.seed(123) n=1e3L idx=5 x=rnorm(n) y=rnorm(n) z=rnorm(n) a=(x[idx]+y[idx]+z[idx])/3 b=mean(c(x[idx],y[idx],z[idx])) a==b # [1] FALSE For idx= 1, 2, 3, 4 the last line is equal to TRUE. For 5, 6 and many others the difference is small but still. Is that expected or is it a bug? Thank you Best Regards Morgan Jacob [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel