Three weeks ago, I read this and thought, "well, you have two options for a default comparison, one based on identity and one on value, both are useful sometimes and Guido prefers identity, and it's OK." But today I understood that I still think otherwise.
In two sentences: sometimes you wish to compare objects according to "identity", and sometimes you wish to compare objects according to "values". Identity-based comparison is done by the "is" operator; Value-based comparison should be done by the == operator. Let's take the car example, and expand it a bit. Let's say wheels have attributes - say, diameter and manufacturer. Let's say those can't change (which is reasonable), to make wheels hashable. There are two ways to compare wheels: by value and by identity. Two wheels may have the same value, that is, they have the same diameter and were created by the same manufacturer. Two wheels may have the same identity, that is, they are actually the same wheel. We may want to compare wheels based on value, for example to make sure that all the car's wheels fit together nicely: assert car.wheel1 == car.wheel2 == car.wheel3 == car.wheel4. We may want to compare wheels based on identity, for example to make sure that we actually bought four wheels in order to assemble the car: assert car.wheel1 is not car.wheel2 and car.wheel3 is not car.wheel1 and car.wheel3 is not car.wheel2... We may want to associate values with wheels based on their values. For example, it's reasonable to suppose that the price of every wheel of the same model is the same. In that case, we'll write: price[wheel] = 25. We may want to associate values with wheels based on their identities. For example, we may want to note that a specific wheel is broken. For this, I'll first define a general class (I defined it before in one of the discussions, that's because I believe it's useful): class Ref(object): def __init__(self, obj): self._obj = obj def __call__(self): return self._obj def __eq__(self, other): return isinstance(other, ref) and self._obj is other._obj def __hash__(self): return id(self._obj) ^ 0xBEEF Now again, how will we say that a specific wheel is broken? Like this: broken[Ref(wheel)] = True Note that the Ref class also allows us to group wheels of the same kind in a set, regardless of their __hash__ method. I think that most objects, especially most user-defined objects, have a *value*. I don't have an exact definition, but a hint is that two objects that were created in the same way have the same value. Sometimes we wish to compare objects based on their identity - in those cases we use the "is" operator. Sometimes we wish to compare objects based on their value - and that's what the == operator is for. Sometimes we wish to use the value of objects as a dictionary key or as a set member, and that's easy. Sometimes we wish to use the identity of objects as a dictionary key or as a set member - and I claim that we should do that by using the Ref class, whose *value* is the object's *identity*, or by using a dict/set subclass, and not by misusing the __hash__ and __eq__ methods. I think that whenever value-based comparison is meaningful, the __eq__ and __hash__ should be value-based. Treating objects by identity should be done explicitly, by the one who uses the objects, by using the "is" operator or the Ref class. It should not be the job of the object to decide which method (value or identity) is more useful - it should allow the user to use both methods, by defining __eq__ and __hash__ based on value. Please give me examples which prove me wrong. I currently think that the only objects for whom value-based comparison is not meaningful, are objects which represent entities which are "outside" of the process, or in other words, entities which are not "computational". This includes files, sockets, possibly user-interface objects, loggers, etc. I think that objects that represent purely "data", have a "value" that they can be compared according to. Even wheels that don't have any attributes are simply equal to other wheels, and not equal to other objects. Since user-defined classes can interact with the "environment" only through other objects or functions, it is reasonable to suggest that they should get a value-based equality operator. Many times the value is defined by the __dict__ and __slots__ members, so it seems to me a reasonable default. I would greatly appreciate repliers that find a tiny bit of reason in what I said (even if they don't agree), and not deny it all as a complete load of rubbish. Thanks, Noam _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com