Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 306 by matt.gi...@gmail.com: Python: Message objects should not
be hashable
http://code.google.com/p/protobuf/issues/detail?id=306
What steps will reproduce the problem?
1. Create a simple .proto file; anything will do:
package test;
message Person {
required string name = 1;
}
2. Create two Message objects and set their fields identically:
import test_pb2
p = test_pb2.Person()
q = test_pb2.Person()
p.name = "Fred"
q.name = "Fred"
3. Note that the two objects compare equally, but their hashes produce
different results:
p == q
True
hash(p) == hash(q)
False
What is the expected output?
hash(p) == hash(q)
TypeError: unhashable type: 'Person'
Rationale
The specification for hashing in Python
(http://docs.python.org/reference/datamodel.html#object.__hash__) specifies
that "The only required property is that objects which compare equal have
the same hash value". Therefore, it is a violation of Python's semantics to
have p and q not hash to the same value. Practical consequences of this are
that if p and q are both inserted into a set or dictionary keys, it will be
undefined whether they will both be stored, or whether one will overwrite
the other (depending on the hash buckets used).
Unfortunately, it is not appropriate to override __hash__ and have the two
objects hash equally when they are considered equal, because they are
mutable. The above specification continues, "If a class defines mutable
objects and implements a __cmp__() or __eq__() method, it should not
implement __hash__(), since hashable collection implementations require
that a object’s hash value is immutable (if the object’s hash value
changes, it will be in the wrong hash bucket)."
The only valid solution is for Message objects to be unhashable (which can
be accomplished by setting __hash__ = None in the Message class). This is
the approach taken by all mutable built-in types in the Python standard
library (e.g., list, set and dict).
This may break existing code, so perhaps it could be introduced as an
option in protoc (which would set __hash__ = None on all of the generated
classes). This would be a useful option, since all code which relies on the
hashability of Message objects is potentially buggy, due to the undefined
behaviour when inserting Messages into hash tables described above.
--
You received this message because you are subscribed to the Google Groups "Protocol
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.