Ah, I didn't get this message. How does this break use in a HashMap? Given two protobufs of the same type in the same JVM, with the same contents, does hashCode() not return the same value?
On Wed, May 18, 2011 at 11:50 AM, Jay Booth <jaybo...@gmail.com> wrote: > I'd also note that this precludes usage of protobuf objects in > HashSets or as keys in HashMaps -- any time someone does that, it will > be a (subtle and confusing) bug, as hashcode() isn't aligned with > equals(). > > On May 18, 2:48 pm, Jay Booth <jaybo...@gmail.com> wrote: > > Well, that's your prerogative, I guess, but why even implement > > hashcode at all then? Just inherit from object and you're getting > > effectively the same behavior. Is that what you're intending? > > > > On May 16, 10:03 am, Pherl Liu <liuj...@google.com> wrote: > > > > > > > > > > > > > > > > > We discussed internally and decided not to make the hashCode() > > > return deterministic result. If you need consistent hashcode in > different > > > runs, use toByteString().hashCode(). > > > > > Quoted from Kenton: > > > > > Hashing the content of the descriptor would actually be incorrect, > because > > > two descriptors with exactly the same content are still considered > different > > > types. Descriptors are compared by identity, hence they are hashed by > > > pointer. > > > > > Removing the descriptor from the calculation would indeed make > hashCode() > > > consistent between two runs of the same binary, and probably > insignificant > > > runtime cost. Of course, once you do that, you will never be able to > > > introduce non-determinism again because people will depend on it. > > > > > But there's a much bigger risk. People may actually start depending on > > > hashCode() returning consistent results between two different versions > of > > > the binary, or two completely separate binaries that compile in the > same > > > protocol, or -- most dangerously -- two different versions of the same > > > protocol (e.g. with fields added or removed). I think it would be very > > > difficult and limiting to make these guarantees, so I would be > extremely > > > cautious about this. > > > > > Certainly, there is no implementation of hashCode() that would be any > safer > > > than .toByteString().hashCode(). So, I'd advise steering people to the > > > latter. Note that if unknown fields are present, the results may still > be > > > inconsistent. However, there is no reasonable way to implement a > hashCode() > > > that is consistent in the presence of unknown fields. > > > > > On Thu, May 12, 2011 at 5:32 AM, Ben Wright <compuware...@gmail.com> > wrote: > > > > I think we wrote those replies at the same time : ) > > > > > > You're right, at the cost of some additional hash collisions, the > > > > simplest solution is to simply not include the type / descriptor in > > > > the hash calculation at all. > > > > > > The best / least-collision solutions with good performance would be > > > > what I wrote in my previous post, but that requires that someone > > > > (presumably a current committer) with sufficient knowledge of the > > > > Descriptor types to have enough time to update the compiler and java > > > > libraries accordingly. > > > > > > Any input from a committer for this issue? Seems the simple solution > > > > would take less than an hour to push into the stream and could make > it > > > > into the next release. > > > > > > On May 11, 5:25 pm, Ben Wright <compuware...@gmail.com> wrote: > > > > > Alternatively... instead of putting the onus on the compiler, the > > > > > hashcode could be computed by the JVM at initialization time for > the > > > > > Descriptor instance, (which would also help performance of > dynamically > > > > > parsed Descriptor instance hashcode calls). > > > > > > > i.e. > > > > > > > private final int computedHashcode; > > > > > > > public Descriptor() { > > > > > //initialization > > > > > > > computedHashcode = do_compute_hashCode(); > > > > > > > } > > > > > > > public int hashCode() { > > > > > return computedHashcode; > > > > > > > } > > > > > > > punlic int do_compute_hashCode(){ > > > > > return // compute hashcode > > > > > > > } > > > > > > > This is all talking towards optimum performance implementation... > the > > > > > real problem is the need for a hashCode implementation for > Descriptor > > > > > based on the actual Descriptor's content... > > > > > > > On May 11, 4:54 pm, Ben Wright <compuware...@gmail.com> wrote: > > > > > > > > Jay: > > > > > > > > Using the class name to generate the hashcode is logically > incorrect > > > > > > because the class name can be derived by the options > java_package_ > > > > > > name and java_outer_classname. > > > > > > > > Additionally (although less likely to matter), separate protocol > > > > > > buffer files can define an identical class names with different > > > > > > protocol buffers. > > > > > > > > Lastly, and most importantly... > > > > > > > > If the same Message is being used with generated code and with > dynamic > > > > > > code, the hash code for the descriptor would still be identical > if > > > > > > generated from the descriptor instance, whereas the dynamic usage > does > > > > > > not have a classname from which to derive a hashcode. While in > your > > > > > > case this should not matter, it does matter for other users of > > > > > > protobuf. The hashcode function would be better served by being > > > > > > implemented correctly from state data for the descriptor. > > > > > > Additionally, in generated code it seems that this hashcode could > be > > > > > > pre-computed by the compiler and Descriptor.hashcode() could > return a > > > > > > constant integer - which would be much more efficient than any > other > > > > > > method. > > > > > > > > On May 11, 3:02 pm, Jay Booth <jaybo...@gmail.com> wrote: > > > > > > > > > It can be legitimate, especially in the case of > Object.hashCode(), > > > > but > > > > > > > it's supposed to be in sync with equals() by contract. As it > stands, > > > > > > > two objects which are equal() will produce different hashes, or > the > > > > > > > same logical object will produce different hashes across JVMs. > That > > > > > > > breaks the contract.. if the equals() method simply did return > > > > (other > > > > > > > == this), then it'd be fine, albeit a little useless. > > > > > > > > > I created an issue and posted a 1-liner patch that would > eliminate > > > > the > > > > > > > problem by using getClass().getName().hashCode() to incorporate > type > > > > > > > information into the hashCode without depending on a Descriptor > > > > > > > object's memory address. > > > > > > > > > On May 11, 12:01 am, Dmitriy Ryaboy <dvrya...@gmail.com> > wrote: > > > > > > > > > > Hi Jay, > > > > > > > > > > I encountered that before. Unfortunately this is a legitimate > thing > > > > to > > > > > > > > do, as documented in Object.hashCode() > > > > > > > > > > I have a write-up of the problem and how we wound up solving > it > > > > (not > > > > > > > > elegant.. suggestions welcome) here: > > > > > http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... > > > > > > > > > > D > > > > > > > > > > On Mon, May 9, 2011 at 8:25 AM, Jay Booth < > jaybo...@gmail.com> > > > > wrote: > > > > > > > > > I'm testing an on-disk hashtable with Protobufs and noticed > that > > > > with > > > > > > > > > the java generated hashcode function, it seems to return a > > > > different > > > > > > > > > hashcode across JVM invocations for the same logically > equivalent > > > > > > > > > object (tested with a single string protobuf, same string > for > > > > both > > > > > > > > > instances). > > > > > > > > > > > Is this known behavior? Bit busy right now backporting > this to > > > > work > > > > > > > > > with String keys instead but I could provide a bit of > command > > > > line > > > > > > > > > code that demonstrates the issue when I get a chance. > > > > > > > > > > > Glancing at the generated hashcode() function, it looks > like the > > > > > > > > > difference comes from etiher > getDescriptorForType().hashCode() or > > > > > > > > > getUnknownFields().hashCode(), both of which are > incorporated. > > > > > > > > > > > -- > > > > > > > > > You received this message because you are subscribed to the > > > > Google Groups "Protocol Buffers" group. > > > > > > > > > To post to this group, send email to > protobuf@googlegroups.com. > > > > > > > > > To unsubscribe from this group, send email to > > > > protobuf+unsubscr...@googlegroups.com. > > > > > > > > > For more options, visit this group athttp:// > > > > groups.google.com/group/protobuf?hl=en. > > > > > > -- > > > > You received this message because you are subscribed to the Google > Groups > > > > "Protocol Buffers" group. > > > > To post to this group, send email to protobuf@googlegroups.com. > > > > To unsubscribe from this group, send email to > > > > protobuf+unsubscr...@googlegroups.com. > > > > For more options, visit this group at > > > >http://groups.google.com/group/protobuf?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to protobuf@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.