Well, to sidestep the whole discussion, we could just eliminate types from hashcode completely and work strictly on field hashcodes. The type check is there in equals(), hashmaps will function correctly, and hash collisions are already allowed between inequal objects, they're a fact of life. Return 1; would be legal, it would just be a really bad idea for anything that actually uses the hashcode.
I figured getClass().getName() would provide the additional hash spread benefit in the majority of cases and HashMaps and the like could fall back on equals() as they always do upon the rare collision between differently-typed-but-same-fields objects. Descriptor has a bunch of stuff attached to it, which would lead to a potentially complicated/expensive hashCode() implementation and I didn't understand that code well enough to suggest a patch. I also don't know the protobuf compiler well enough to have an opinion on generating a constant per-class as you suggest, I could see it leading to trouble when you regenerate classes that haven't changed unless it's deterministic somehow. Just going with the KISS principle. On May 11, 4:54 pm, Ben Wright <[email protected]> wrote: > Jay: > > Using the class name to generate the hashcode is logically incorrect > because the class name can be derived by the options java_package_ > name and java_outer_classname. > > Additionally (although less likely to matter), separate protocol > buffer files can define an identical class names with different > protocol buffers. > > Lastly, and most importantly... > > If the same Message is being used with generated code and with dynamic > code, the hash code for the descriptor would still be identical if > generated from the descriptor instance, whereas the dynamic usage does > not have a classname from which to derive a hashcode. While in your > case this should not matter, it does matter for other users of > protobuf. The hashcode function would be better served by being > implemented correctly from state data for the descriptor. > Additionally, in generated code it seems that this hashcode could be > pre-computed by the compiler and Descriptor.hashcode() could return a > constant integer - which would be much more efficient than any other > method. > > On May 11, 3:02 pm, Jay Booth <[email protected]> wrote: > > > > > > > > > It can be legitimate, especially in the case of Object.hashCode(), but > > it's supposed to be in sync with equals() by contract. As it stands, > > two objects which are equal() will produce different hashes, or the > > same logical object will produce different hashes across JVMs. That > > breaks the contract.. if the equals() method simply did return (other > > == this), then it'd be fine, albeit a little useless. > > > I created an issue and posted a 1-liner patch that would eliminate the > > problem by using getClass().getName().hashCode() to incorporate type > > information into the hashCode without depending on a Descriptor > > object's memory address. > > > On May 11, 12:01 am, Dmitriy Ryaboy <[email protected]> wrote: > > > > Hi Jay, > > > > I encountered that before. Unfortunately this is a legitimate thing to > > > do, as documented in Object.hashCode() > > > > I have a write-up of the problem and how we wound up solving it (not > > > elegant.. suggestions welcome) > > > here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... > > > > D > > > > On Mon, May 9, 2011 at 8:25 AM, Jay Booth <[email protected]> wrote: > > > > I'm testing an on-disk hashtable with Protobufs and noticed that with > > > > the java generated hashcode function, it seems to return a different > > > > hashcode across JVM invocations for the same logically equivalent > > > > object (tested with a single string protobuf, same string for both > > > > instances). > > > > > Is this known behavior? Bit busy right now backporting this to work > > > > with String keys instead but I could provide a bit of command line > > > > code that demonstrates the issue when I get a chance. > > > > > Glancing at the generated hashcode() function, it looks like the > > > > difference comes from etiher getDescriptorForType().hashCode() or > > > > getUnknownFields().hashCode(), both of which are incorporated. > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups "Protocol Buffers" group. > > > > To post to this group, send email to [email protected]. > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group > > > > athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
