That sounds very useful for a "what not to do example", could you please give a little more detail (big lines) on how the business code could starve the Ignite thread pool? And if using entry processors, how come the operations were not executed atomically - i.e. what made the race condition possible?
Thank you. On Wed, Jun 5, 2019 at 1:10 AM kimec.ethome.sk <ki...@ethome.sk> wrote: > Hi Ilya, > > I have tracked down this issue to a racy behavior in the business code > and Ignite thread pool starvation caused by the application code. > > Sorry for the false alarm. > > --- > S pozdravom, > > Kamil Mišúth > > On 2019-05-22 18:46, Ilya Kasnacheev wrote: > > Hello! > > > > Do you have reproducer for this behavior? Have you tried the same > > scenario on 2.7? I doubt anyone will take effort to debug 2.6. > > > > Regards, > > > > -- > > > > Ilya Kasnacheev > > > > чт, 25 апр. 2019 г. в 18:59, kimec.ethome.sk [1] > > <ki...@ethome.sk>: > > > >> Greetings, > >> > >> we've been chasing a weird issue in a two node cluster for few days > >> now. > >> We have a spring boot application bundled with an ignite server > >> node. > >> > >> We use invokeAsync on TRANSACTIONAL PARTITIONED cache with 1 backup. > >> We > >> assume that each node in the two node cluster has a copy of the > >> other > >> node's data. In a way, this mimics REPLICATED cache configuration. > >> Our > >> business logic is written within an EntryProcessor. The "business > >> code" > >> in the EntryProcessor is idempotent and arguments to the processor > >> are > >> fixed. At the end of the "invokeAsync" call, i.e. when IgniteFuture > >> is > >> resolved, we return a value returned from the EntryProcessor via > >> REST to > >> the caller of our API. > >> > >> The problem occurres when one of the two nodes is restarted > >> (triggering > >> re-balancing) and we simultaneously receive a call to our REST API > >> launching a businesses computation in EntryProcessor. > >> The code in EntryProcessor properly computes a new value that we > >> want to > >> store in the cache. No exception is thrown so we leak it out the > >> REST > >> caller as a return value, but when rebalancing finishes, the value > >> is > >> not in the cache anymore. > >> Yet the caller "saw" and stored the value we returned from our > >> EntryProcessor. > >> > >> We did experiment with various cache settings but the problem simply > >> > >> persists. In fact we initially used REPLICATED cache configuration > >> but > >> the behavior was pretty much the same. > >> > >> We have currently settled on a rather extreme configuration, but the > >> > >> data is still lost during rebalancing from time to time. We are > >> using > >> Ignite 2.6 and gatling for REST load testing. > >> The load on the REST api and consequently on Ignite is not very > >> high. > >> > >> setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL) > >> setCacheMode(CacheMode.PARTITIONED) > >> setBackups(1) > >> setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC) > >> setRebalanceMode(CacheRebalanceMode.SYNC) > >> setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE) > >> setAffinity(new RendezvousAffinityFunction().setPartitions(2)) > >> > >> I would appreciate any pointers what may be wrong with our > >> setup/config. > >> > >> Thank you. > >> > >> Kamil > > > > > > Links: > > ------ > > [1] http://kimec.ethome.sk >