I assume the use case of passing the hashCode is to be able to put the object directly into a hashmap bucket without constructing its state.
Would it be realistic to do something like this: The first time *ever* an object arrives for a given cache (or when the first object arrives in the cache empty) Reconstruct the object, ask for its hashCode. If this mismatches the one transmitted over the wire, complain violently ? A non-moveable hashcode will in most cases be revealed immediately ? Kristian 2016-06-23 11:20 GMT+02:00 Denis Magda <[email protected]>: > Seems that this.getClass().hashCode() executed on different VMs can produce > different result (but it should always produce the same result on a single VM > which doesn’t violate JVM specification). Ignite requires the hashCode of a > key to be consistent cluster-wide. So Ignite has even more stronger > requirement then JVM spec. > > — > Denis > >> On Jun 23, 2016, at 11:30 AM, Kristian Rosenvold <[email protected]> >> wrote: >> >> We think the issue may regard transportability of the hashCode across >> nodes, because the hashcode in question included the hashcode of a >> class (in other words this.getClass().hashCode() as opposed to the >> more robust this.getClass().getName().hashCode()) >> >> Does ignite require the hashCode of a key to be cluster-wide consistent ? >> >> (This would actually be a violation of the javadoc contract for >> hashcode, which states "This integer need not remain consistent from >> one execution of an application to another execution of the same >> application.". But it should possible to actually test for this if it >> is a constraint required by ignite.) >> >> If this does not appear to be the problem, I can supply the code in question. >> >> Kristian >> >> >> >> 2016-06-23 10:05 GMT+02:00 Denis Magda <[email protected]>: >>> Hi Kristian, >>> >>> Could you share the source of a class that has inconsistent equals/hashCode >>> implementation? Probably we will be able to detect your case internally >>> somehow and print a warning. >>> >>> — >>> Denis >>> >>>> On Jun 17, 2016, at 10:27 PM, Kristian Rosenvold <[email protected]> >>>> wrote: >>>> >>>> This whole issue was caused by inconsistent equals/hashCode on a cache >>>> key, which appearantly has the capability of stopping replication dead >>>> in its tracks. Nailing this one after 3-4 days of a very nagging >>>> "select is broken" feeling was great. You guys helping us here might >>>> want to be particularly aware of this, since it undeniably gives a newbie >>>> an >>>> impression that ignite is broken while it's my code :) >>>> >>>> Thanks for the help ! >>>> >>>> Kristian >>>> >>>> >>>> 2016-06-17 20:00 GMT+02:00 Alexey Goncharuk <[email protected]>: >>>>> Kristian, >>>>> >>>>> Are you sure you are using the latest 1.7-SNAPSHOT for your production >>>>> data? >>>>> Did you build binaries yourself? Can you confirm the commit# of the >>>>> binaries >>>>> you are using? The issue you are reporting seems to be the same as >>>>> IGNITE-3305 and, since the fix was committed only a couple of days ago, it >>>>> might not get to nightly snapshot. >>>>> >>>>> 2016-06-17 9:06 GMT-07:00 Kristian Rosenvold <[email protected]>: >>>>>> >>>>>> Sigh, this has all the hallmarks of a thread safety issue or race >>>>>> condition. >>>>>> >>>>>> I had a perfect testcase that replicated the problem 100% of the time, >>>>>> but only when running on distinct nodes (never occurs on same box) >>>>>> with 2 distinct caches and with ignite 1.5; I just expanded the >>>>>> testcase I posted initially . Typically I'd be missing the last 10-20 >>>>>> elements in the cache. I was about 2 seconds from reporting an issue >>>>>> and then I switched to yesterday's 1.7-SNAPSHOT version and it went >>>>>> away. Unfortunately 1.7-SNAPSHOT exhibits the same behaviour with my >>>>>> production data, it just broke my testcase :( Assumably I just need to >>>>>> tweak the cache sizes or element counts to hit some kind of non-sweet >>>>>> spot, and then it probably fails on my machine. >>>>>> >>>>>> The testcase always worked on a single box, which lead me to think >>>>>> about socket-related issues. But it also required 2 caches to fail, >>>>>> which lead me to think about race conditions like the rebalance >>>>>> terminating once the first node finishes. >>>>>> >>>>>> I'm no stranger to reading bug reports like this myself, and I must >>>>>> admit this seems pretty tough to diagnose. >>>>>> >>>>>> Kristian >>>>>> >>>>>> >>>>>> 2016-06-17 14:57 GMT+02:00 Denis Magda <[email protected]>: >>>>>>> Hi Kristian, >>>>>>> >>>>>>> Your test looks absolutely correct for me. However I didn’t manage to >>>>>>> reproduce this issue on my side as well. >>>>>>> >>>>>>> Alex G., do you have any ideas on what can be a reason of that? Can you >>>>>>> recommend Kristian enabling of DEBUG/TRACE log levels for particular >>>>>>> modules? Probably advanced logging will let us to pin point the issue >>>>>>> that >>>>>>> happens in Kristian’s environment. >>>>>>> >>>>>>> — >>>>>>> Denis >>>>>>> >>>>>>> On Jun 17, 2016, at 10:02 AM, Kristian Rosenvold <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> For ignite 1.5, 1.6 and 1.7-SNAPSHOT, I see the same behaviour. Since >>>>>>> REPLICATED caches seem to be broken on 1.6 and beyond, I am testing >>>>>>> this on 1.5: >>>>>>> >>>>>>> I can reliably start two nodes and get consistent correct results, >>>>>>> lets say each node has 1.5 million elements in a given cache. >>>>>>> >>>>>>> Once I start a third or fourth node in the same cluster, it >>>>>>> consistently gets a random incorrect number of elements in the same >>>>>>> cache, typically 1.1 million or so. >>>>>>> >>>>>>> I tried to create a testcase to reproduce this on my local machine >>>>>>> >>>>>>> (https://github.com/krosenvold/ignite/commit/4fb3f20f51280d8381e331b7bcdb2bae95b76b95), >>>>>>> but this fails to reproduce the problem. >>>>>>> >>>>>>> I have two nodes in 2 different datacenters, so there will invariably >>>>>>> be some differences in latencies/response times between the existing 2 >>>>>>> nodes and the newly started node. >>>>>>> >>>>>>> This sounds like some kind of timing related bug, any tips ? Is there >>>>>>> any way I kan skew the timing in the testcase ? >>>>>>> >>>>>>> >>>>>>> Kristian >>>>>>> >>>>>>> >>>>> >>>>> >>> >
