Re: Timeouts in atomic cache
Guys, I have edited https://issues.apache.org/jira/browse/IGNITE-5811 and extended it a bit. Comments are welcome! --Yakov
Re: Timeouts in atomic cache
Here is the newbie ticket for removing the exception - https://issues.apache.org/jira/browse/IGNITE-5823. --Yakov
Re: Timeouts in atomic cache
Val, I think this should be something similar to deadlock detection, but different condition. --Yakov
Re: Timeouts in atomic cache
Yakov, Thanks for response. I definitely like the idea of detecting Java level deadlocks. As for hangs caused by Ignite internal problems, do we have a ticket for this as well? Do you have any idea about how this should be implemented? -Val On Mon, Jul 24, 2017 at 3:55 AM, Yakov Zhdanovwrote: > Val, it seems you spotted and issue. Please file a ticket - I would suggest > to remove the exceptions entirely as in my understanding timeout logic for > atomic operation will bring additional overhead, but most of the time > atomic operations are instant. From timeout perspective, what differs > atomic operation from a transaction is that you cannot predict when user > releases lock he acquired inside a transaction, but atomic operation should > have predictable timeout. > > As far as your example. Currently, this will lead to java-level deadlock on > synchronized sections for the cache entries (but when we move to pure > thread-per-partition for atomic caches this will not be an issue any more > https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we > file > a ticket to implement detection of java-level deadlock and allow user to > configure policy to take appropriate action on deadlock wherever it happens > - https://issues.apache.org/jira/browse/IGNITE-5811 > > Any other hang of the atomic operation seem to be caused by issues in > Ignite's internal machinery - either hanged exchange or problems in message > processing on some node (e.g. all threads are busy and/or in deadlock) > which again should result in notifying user and stopping node (by default). > > --Yakov >
Re: Timeouts in atomic cache
Val, it seems you spotted and issue. Please file a ticket - I would suggest to remove the exceptions entirely as in my understanding timeout logic for atomic operation will bring additional overhead, but most of the time atomic operations are instant. From timeout perspective, what differs atomic operation from a transaction is that you cannot predict when user releases lock he acquired inside a transaction, but atomic operation should have predictable timeout. As far as your example. Currently, this will lead to java-level deadlock on synchronized sections for the cache entries (but when we move to pure thread-per-partition for atomic caches this will not be an issue any more https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we file a ticket to implement detection of java-level deadlock and allow user to configure policy to take appropriate action on deadlock wherever it happens - https://issues.apache.org/jira/browse/IGNITE-5811 Any other hang of the atomic operation seem to be caused by issues in Ignite's internal machinery - either hanged exchange or problems in message processing on some node (e.g. all threads are busy and/or in deadlock) which again should result in notifying user and stopping node (by default). --Yakov
Re: Timeouts in atomic cache
Any thoughts? -Val On Wed, Jul 19, 2017 at 4:21 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Folks, > > Do we currently have any way to set a timeout for an atomic operation? I > don't see neither a way to do this nor any related documentation. > > In the code there are CacheAtomicUpdateTimeoutException and > CacheAtomicUpdateTimeoutCheckedException, but I can't find a single place > where it's created and/or thrown. Looks like we used to have this > functionality, but it's not there anymore. Is this really the case or I > missed something? > > I think having a way to timeout atomic operation is very important. For > example, two concurrent putAll operations with keys in different order can > completely hang the whole cluster forever, which is unacceptable. Is it > possible to timeout one of the operations (or both of them) in this case? > > -Val >