Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 306 by matt.gi...@gmail.com: Python: Message objects should not be hashable
http://code.google.com/p/protobuf/issues/detail?id=306

What steps will reproduce the problem?
1. Create a simple .proto file; anything will do:

package test;
message Person {
    required string name = 1;
}

2. Create two Message objects and set their fields identically:

import test_pb2
p = test_pb2.Person()
q = test_pb2.Person()
p.name = "Fred"
q.name = "Fred"

3. Note that the two objects compare equally, but their hashes produce different results:

p == q
True
hash(p) == hash(q)
False

What is the expected output?

hash(p) == hash(q)
TypeError: unhashable type: 'Person'

Rationale

The specification for hashing in Python (http://docs.python.org/reference/datamodel.html#object.__hash__) specifies that "The only required property is that objects which compare equal have the same hash value". Therefore, it is a violation of Python's semantics to have p and q not hash to the same value. Practical consequences of this are that if p and q are both inserted into a set or dictionary keys, it will be undefined whether they will both be stored, or whether one will overwrite the other (depending on the hash buckets used).

Unfortunately, it is not appropriate to override __hash__ and have the two objects hash equally when they are considered equal, because they are mutable. The above specification continues, "If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket)."

The only valid solution is for Message objects to be unhashable (which can be accomplished by setting __hash__ = None in the Message class). This is the approach taken by all mutable built-in types in the Python standard library (e.g., list, set and dict).

This may break existing code, so perhaps it could be introduced as an option in protoc (which would set __hash__ = None on all of the generated classes). This would be a useful option, since all code which relies on the hashability of Message objects is potentially buggy, due to the undefined behaviour when inserting Messages into hash tables described above.

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to