On Thu, 25 Mar 2010 06:26:11 am Mark Dickinson wrote: > Here's an interesting recent blog post on this subject, from the > creator of Eiffel: > > http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of- >civilization/
Sorry, but he lost me right at the beginning when he quoted someone else: "there is no reason to believe that the result of one calculation with unclear value should match that of another calculation with unclear value" and then argued: "The exact same argument can be used to assert that the result should not be False: … there is no reason to believe that the result of one calculation with unclear value should not match that of another calculation with unclear value. Just as convincing! Both arguments complement each other: there is no compelling reason for demanding that the values be equal; and there is no compelling argument either to demand that they be different. If you ignore one of the two sides, you are biased." This whole argument is invalid on at least three levels. I'll get the first two out the way briefly #1: Bertrand starts by treating NANs as "unclear values", and concludes that we shouldn't prefer "two unclear values are different" as more compelling than "two unclear values are the same". But this is ridiculous -- if you ask me a pair of questions, and I answer "I'm not sure" to both of them, why would you assume that the right answer to both questions is actually the same? #2: But in fact NANs aren't "unclear values", they are not values at all. The answer to "what is the non-complex logarithm of -1?" is not "I'm not sure" but "there is no such value". Bertrand spends an awful lot of time trying to demonstrate why the reflexivity of equality (every x is equal to itself) should apply to NANs as well as the other floats, but RoE is a property of equivalence relations, which does not (and should not) hold for "there is no such value". By analogy: the Lizard King of Russia does not exist; the Vampire Queen of New Orleans also does not exist. We don't therefore conclude that the Lizard King and the Vampire Queen are therefore the same person. #3: We could, if we wish, violate the IEEE standard and treat equality of NANs as an equivalence relation. It's our language, we're free to follow whatever standards we like, and reflexivity of equality is a very useful axiom to have. Since it applies to all non-NAN floats (and virtually every object in Python, other than those with funny __eq__ methods), perhaps we should extend it to NANs as well? I hope to convince you that the cost of doing so is worse than the disease. Since NANs are usually found in mathematical contexts, we should follow the IEEE standard even at the cost of rare anomalies in non-mathematical code containing NANs. Simply put: we should treat "two unclear values are different" as more compelling than "two unclear values are the same" as it leads to fewer, smaller, errors. Consider: log(-1) = NAN # maths equality, not assignment log(-2) = NAN If we allow NAN = NAN, then we permit the error: log(-1) = NAN = log(-2) therefore log(-1) = log(-2) and 1 = 2 But if make NAN != NAN, then we get: log(-1) != log(-2) and all of mathematics does not collapse into a pile of rubble. I think that is a fairly compelling reason to prefer inequality over equality. One objection might be that while log(-1) and log(-2) should be considered different NANs, surely NANs should be equal to themselves? -1 = -1 implies log(-1) = log(-1) But consider the practicalities: there are far more floats than available NAN payloads. We simply can't map every invalid calculation to a unique NAN, and therefore there *must* be cases like: log(-123.456789e-8) = log(-9.876e47) implies 123.456789e-8 = 9.876e47 So we mustn't consider NANs equal just because their payloads are equal. What about identity? Even if we don't dare allow this: x = log(-1) # assignment y = log(-1) # another NAN with the same payload assert x is not y assert x == y surely we can allow this? assert x == x But this is dangerous. Don't be fooled by the simplicity of the above example. Just because you have two references to the same (as in identity) NAN, doesn't mean they represent "the same thing" or came from the same place: data = [1, 2, float('nan'), float('nan'), 3] x = harmonic_mean(data) y = 1 - geometric_mean(data) It is an accident of implementation whether x and y happen to be the same object or not. Why should their inequality depend on such a fragile thing? In fact, identity of NANs is itself an implementation quirk of programming languages like Python: logically, NANs don't have identity at all. To put it another way: all ONEs are the same ONE, even if they come from different sources, are in different memory locations, or have different identities; but all NANs are different, even if they come from the same source, are in the same memory location, or have the same identity. The fundamental problem here is that NANs are not values. If you treat them as if they were values, then you want reflexivity of equality. But they're not -- they're *signals* for "your calculation has gone screwy and the result you get is garbage", so to speak. You shouldn't even think of a specific NAN as a piece of specific garbage, but merely a label on the *kind* of garbage you've got (the payload): INF-INF is, in some sense, a different kind of error to log(-1). In the same way you might say "INF-INF could be any number at all, therefore we return NAN", you might say "since INF-INF could be anything, there's no reason to think that INF-INF == INF-INF." -- Steven D'Aprano _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com