>>>>> "DM" == Duncan Murdoch <murd...@stats.uwo.ca> >>>>> on Fri, 07 Aug 2009 12:55:50 -0400 writes:
DM> On 8/7/2009 11:41 AM, Martin Maechler wrote: >>>>>>> "DM" == Duncan Murdoch <murd...@stats.uwo.ca> >>>>>>> on Fri, 07 Aug 2009 11:25:11 -0400 writes: >> DM> On 8/7/2009 10:46 AM, Martin Maechler wrote: >> >>>>>>> "TH" == Ted Harding <ted.hard...@manchester.ac.uk> >> >>>>>>> on Fri, 07 Aug 2009 14:49:54 +0100 (BST) writes: >> >> TH> On 07-Aug-09 11:07:08, Duncan Murdoch wrote: >> >> >> Martin Maechler wrote: >> >> >>>>>>>> William Dunlap <wdun...@tibco.com> >> >> >>>>>>>> on Thu, 6 Aug 2009 15:06:08 -0700 writes: >> >> >>> >> -----Original Message----- From: >> >> >>> >> r-help-boun...@r-project.org >> >> >>> >> [mailto:r-help-boun...@r-project.org] On Behalf Of >> >> >>> >> Giovanni Petris Sent: Thursday, August 06, 2009 3:00 PM >> >> >>> >> To: milton.ru...@gmail.com Cc: r-h...@r-project.org; >> >> >>> >> daniel.gerl...@geodecapital.com Subject: Re: [R] Why is 0 >> >> >>> >> not an integer? >> >> >>> >> >> >> >>> >> >> >> >>> >> I ran an instant experiment... >> >> >>> >> >> >> >>> >> > typeof(0) [1] "double" > typeof(-0) [1] "double" > >> >> >>> >> identical(0, -0) [1] TRUE >> >> >>> >> >> >> >>> >> Best, Giovanni >> >> >>> >> >> >>> > But 0.0 and -0.0 have different reciprocals >> >> >>> >> >> >>> >> 1.0/0.0 >> >> >>> > [1] Inf >> >> >>> >> 1.0/-0.0 >> >> >>> > [1] -Inf >> >> >>> >> >> >>> > Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap >> >> >>> > tibco.com >> >> >>> >> >> >>> yes. {finally something interesting in this boring thread !} ---> diverting to R-devel >> >> >>> >> >> >>> In April, I've had a private e-mail communication with John >> >> >>> Chambers [father of S, notably S4, which also brought identical()] >> >> >>> and Bill about the topic, >> >> >>> where I had started suggesting that R should be changed such >> >> >>> that >> >> >>> identical(-0. , +0.) >> >> >>> would return FALSE. >> >> >>> Bill did mention that it does so for (newish versions of) S+ >> >> >>> and that he'd prefer that, too, >> >> >>> and John said >> >> >>> >> >> >>> >> I agree on having a preference for a bitwise comparison for >> >> >>> >> identical()---that's what the name means after all. But since >> >> >>> >> someone implemented the numerical case as the C == it's probably >> >> >>> >> going to be more hassle than it's worth to change it. But we >> >> >>> >> should make the implementation clear in the documentation. >> >> >>> >> >> >>> so in principle, we all agreed that R's identical() should be >> >> >>> changed here, namely by using something like memcmp() instead >> >> >>> of simple '==' , however we haven't bothered to actually >> >> >>> *implement* this change. >> >> >>> >> >> >>> I am currently testing a patch which would lead to >> >> >>> identical(0, -0) return FALSE. >> >> >>> >> >> >> I don't think that would be a good idea. Other expressions besides >> >> >> "-0" >> >> >> calculate the zero with the negative sign bit, e.g. the following >> >> >> sequence: >> >> >> >> >> >> pos <- 1 >> >> >> neg <- -1 >> >> >> zero <- 0 >> >> >> y <- zero*pos >> >> >> z <- zero*neg >> >> >> identical(y, z) >> >> >> >> >> >> I think most R users would expect the last expression there to be >> >> >> TRUE based on the previous two lines, given that pos and neg both >> >> >> have finite values. In a simple case like this y == z would be a >> >> >> better test to use, but if those were components of a larger >> >> >> structure, identical() is all we've got, and people would waste a >> >> >> lot of time tracking down why structures differing only in the >> >> >> sign of zero were not identical, even though every element tested >> >> >> equal. >> >> >> >> identical() *is* not the same as '==' even if you think of a >> >> generalized '==', >> >> and your example is not convincing to me. >> DM> Fair enough, but after your change, how would one do what DM> identical(list(pos, neg, zero, y), list(pos, neg, zero, z)) does now? DM> That seems to me to be a more useful comparison than one that declares DM> those to be unequal because the signs of y and z differ. >> >> Maybe something like >> >> all(mapply(`==`, list(pos, neg, zero, y), list(pos, neg, zero, z))) >> >> ## or even >> >> isTRUE(all.equal( list(pos, neg, zero, y), list(pos, neg, zero, z), >> tol = 0)) DM> I think I didn't make my point clearly. I'm not particularly worried DM> about lists of numbers, I'm worried about signed zeros buried in complex DM> structures. identical(struc1, struc2) works nicely now for that sort of DM> comparison; indeed the man page for it says: and so does isTRUE(all.equal(..)) as given above. For me, all your arguments point to all.equal(..., tol=0) DM> indeed the man page for it says: DM> A call to 'identical' is the way to test exact equality in 'if' DM> and 'while' statements, as well as in logical expressions that use DM> '&&' or '||'. In all these applications you need to be assured of DM> getting a single logical value. Yes, note the word "exact" .. but see below DM> The description you quote below does contradict this, and it also DM> contradicts the statement DM> 'identical' sees 'NaN' as different from 'NA_real_', but all DM> 'NaN's are equal (and all 'NA' of the same type are equal). which makes sense as I think they cannot be distinguished by R, but even here, I could think of case where I'd like identical() to be less lenient.... Maybe we should think of a 3rd optional argument, along the lines Ted suggested (but with a different default than his..). DM> I think the solution is to fix the man page, not the DM> function. NO !!!!! As I said very early: identical() was introduced with S4, ca. 1998, by John Chambers. The DESCRIPTION above is really what it should do ! In Splus 5.1 { 1999 }, one of the earliest publicly available versions of S4, identical(0. , -0.) already gives FALSE. identical() was introduced into R for 1.4.0, spring 2002, and given the above, it just always never did what it should have, and of course, that bug / problem *is* very rare and typically not very consequential and so we all have lived with that buglet for 7 years... Can you give a *real* {not contrived} example where the old use was important? Do you know of cases where users used identical() in cases they should have used all.equal(*, tol=0)? Maybe we should introduce a function that's basically isTRUE(all.equal(..., tol=0)) {but faster}, or do you want a 3rd argument to identical, say 'method' with default c("oneNaN", "use.==", "strict") oneNaN: my proposal of using memcmp() on doubles as its used for other types already (and hence distinguishing +0 and -0; otherwise keeping the feature that there's just one NaN which differs from 'NA' (and there's just one 'NA'). use.==: the previous R behaviour, using '==' on doubles (and the "oneNaN" behavior) strict: be even stricter than oneNaN: Use memcmp() unconditionally for doubles. This would be the fastest version of all three. DM> For DM> example, the "_exactly_" seems to be what is upsetting you; I'd suggest DM> instead DM> "The safe and reliable way to test two objects for being equal in DM> structure and content. It returns 'TRUE' in this case, 'FALSE' in every DM> other case." I don't think so, not at all. That would rather be a description of isTRUE(all.equal(..., tol=0)) DM> Duncan Murdoch >> >> the latter of which is more flexible adaptable at what the user >> is really wanting to test. >> >> >> Further note that help(identical) has always said >> >> >> >> > Description: >> >> >> >> > The safe and reliable way to test two objects for being _exactly_ >> >> > equal. It returns 'TRUE' in this case, 'FALSE' in every other case. >> >> >> >> which really should distinguish -0 and +0 >> >> >> >> >> >> >> Duncan Murdoch >> >> >>> Martin Maechler, ETH Zurich >> >> TH> My own view of this is that there may in certain cirumstances be an TH> interest in distinguishing between 0 and (-0), yet normally most TH> users will simply want to compare the numerical values. >> >> TH> Therefore I am in favour of revising identical() so that it can so TH> distinguish; but also of taking the opportunity to give it a parameter TH> say >> >> TH> identical(x,y,sign.bit=FALSE) >> >> TH> so that the default behaviour would be to see 0 and (-0) as identical, TH> but with sign.bit=TRUE it would see the difference. >> >> TH> However, I put this forward in ignorance of TH> a) Any difficulties that this may present in re-coding identical(); TH> b) Any complications that may arise when applying this new form TH> to complex objects. >> >> >> >> Your proposal would actually need to special case this one case, >> >> rather than my patch which replaces using '==' (in C) for >> >> double by using memcmp() instead, something which is already >> >> used for several other cases there, and hence seems more >> >> consequent and in that way natural. >> >> >> >> The one thing even the new code would not differentiate is the >> >> different NaN's (apart from NA) but they are not differentiable >> >> on the R level either, AFAIK, at least AFAIU our language >> >> specifications, we only want two things: NA and NaN >> DM> I don't understand what you are proposing now. The different NaN's have DM> different bit patterns, so wouldn't memcmp() see a difference? And DM> taking your literalist point of view, the fact that it is hard to detect DM> the difference at the R level (requiring C code support to do it) DM> doesn't mean there is no difference, there's just a very subtle, rarely DM> detectable difference, like the one between +0 and -0. >> DM> Duncan Murdoch >> >> >> >> >> Martin >> >> >> >> ______________________________________________ >> >> R-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel DM> ______________________________________________ DM> R-devel@r-project.org mailing list DM> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel