Steven D'Aprano <ste...@remove.this.cybersource.com.au> wrote: > To prove my claim, all you need is two domains with a mutually > incompatible definition of equality. That's not so difficult, surely? How > about equality of integers, versus equality of integers modulo some N?
No, that's not an example. The integers modulo N form a ring Z/NZ of residue classes. Such residue classes are distinct from the integers -- e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N, 3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z to Z/NZ under which 3 + NZ is the image of 3. If we decide to define the == operator such that 3 == 3 + NZ and 3 + N == 3 + NZ then == is not an equivalence relation (in particular, transitivity fails). But that's just an artifact of the definition. If we distinguish 3 from 3 + NZ then everything is fine. 3 + NZ == (3 + N) + NZ correctly, but 3 != 3 + N, and all is well. Here, at least, the problem is not that == as an equivalence relation fails in some particular domain -- because in both Z and Z/NZ it can be a perfectly fine equivalence relation -- but that it can potentially fail on the boundaries between domains. Easy answer: don't mess it up at the boundaries. Proposition. Let U, U' be disjoint sets, and let E, E' be equivalence relations on U, U' respectively. Define E^ on U union U' as E^ = E union E', i.e., E^(x, y) iff x in U and y in U and E(x, y) or x in U' and y in U' and E'(x, y) Then E^ is an equivalence relation. Proof. Reflexivity and symmetry are trivial; transitivity follows from disjointness of U and U'. > It *can* be a problem, if you insist on using == on arbitrary types > while still expecting it to be an equivalence relation. Unfortunately, from the surrounding discussion, it seems that container types particularly want to be able to contain arbitrary objects, and the failure of == to be a equivalence relation makes this fail. The problem is that objects with wacky == operators are still more or less quacking like the more usual kinds of ducks; but they turn out to taste very different. > Let's denote regular, case-sensitive strings using "abc", and special, > case-insensitive strings using i"abc". So for regular strings, equality > is an e-r; for case-insensitive strings, equality is also an e-r (I > trust that the truth of this is obvious). But if you try to use equality > on *both* regular and case-insensitive strings, it fails to be an e-r: > > i"abc" =~ "ABC" returns True if you use the case-insensitive definition > of equality, but returns False if you use the case-sensitive definition. > There is no single definition of equality that is *simultaneously* case- > sensitive and case-insensitive. A case-sensitive string is /not the same/ as a case-insensitive string. One's a duck, the other's a goose. I'd claim here that i"abc" =~ "ABC" must be False, because i"abc" =~ "abc" must be false also! To define it otherwise leads to the incoherence you describe. But the above proposition provides an easy answer. > > A valuable property might be that x =~ y if x and y are > > indistinguishable without using `is'. > > That's a little strong, because it implies that equality must look at > *everything* about a particular object, not just whatever bits of data > are relevant for the problem domain. Yes. That's one of the reasons that =~ isn't the same as ==. I've been thinking on my feet in this thread, so I haven't thought everything through. And as I mention below, there are /many/ useful equality predicates on values. As I didn't mention (but hope is obvious) having a massively-parametrized equality predicate is daft, and providing enough to suit every possible application equally so. But we might be able to do well enough with just one or two -- or maybe by just leaving things as they are. > For example, consider storing data in a dict. > > >>> D1 = {-1: 0, -2: 0} > >>> D2 = {-2: 0} > >>> D2[-1] = 0 > >>> D1 == D2 > True > > > We certainly want D1 and D2 to be equal. Do we? If we're using my `indistinguishable without using ``is''' criterion from above, then D1 and D2 are certainly different! To detect the difference, mutate one and see if the other changes: def distinct_dictionaries_p(D1, D2): """ Decide whether D1 and D2 are the same dictionary or not. Not threadsafe. """ magic = [] more_magic = [magic] old = D1.get('mumble', more_magic) D1['mumble'] = magic result = D2.get('mumble', more_magic) is magic if old is more_magic: del D1['mumble'] else: D1['mumble'] = old return result But that criterion was a suggestion -- a way of defining a coherent equivalence relation on the whole of the Python value space which is coarser than `is' and maybe more useful. My primary purpose in proposing it was to stimulate discussion: what /do/ we want from equality predicates? We already have `is', which is too fine-grained to be widely useful: it distinguishes between different instances of the number 500000, for example, and I can't for the life of me see why that's a useful behaviour. (The `is' operator is a fine thing, and I wouldn't want it any other way: it trades away some useful semantics for the sake of speed, and that was the /right/ decision.) My criterion succeeds in distinguishing 1 from 1.0 (they have different types), which may be considered good. It doesn't distinguish a quiet NaN from another quiet NaN: that's definitely good. (It'd be bogus for a numeric equality operator, but we've already got one of those, so we don't need to define another.) But you're probably right: it's still too fine-grained for some purposes. > But their history is different, and that makes their internal details > different, which has detectable consequences: > > >>> D1 > {-2: 0, -1: 0} > >>> D2 > {-1: 0, -2: 0} So in this case, `str' also works as a distinguisher. Fine. > There may be problem domains where the order of elements in a list (or > tree structure) *is* important, and other problem domains where order is > irrelevant. One single relation can't cover all such conflicting > requirements. Absolutely. This is why Common Lisp provides four(!) out of the box and it still isn't enough. Python provides one (`is') and a half (`==' when it's behaving) is actually coping remarkably well considering. But this /is/ causing problems, and so thinking about solutions seems reasonable. I'm not trying to change the language. I don't have a pet feature I want added. I do think the discussion is interesting and worthwhile, though. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list