Re: Timeouts in atomic cache

2017-07-25 Thread Yakov Zhdanov
Guys, I have edited https://issues.apache.org/jira/browse/IGNITE-5811 and
extended it a bit. Comments are welcome!

--Yakov


Re: Timeouts in atomic cache

2017-07-25 Thread Yakov Zhdanov
Here is the newbie ticket for removing the exception -
https://issues.apache.org/jira/browse/IGNITE-5823.

--Yakov


Re: Timeouts in atomic cache

2017-07-25 Thread Yakov Zhdanov
Val, I think this should be something similar to deadlock detection, but
different condition.

--Yakov


Re: Timeouts in atomic cache

2017-07-24 Thread Valentin Kulichenko
Yakov,

Thanks for response. I definitely like the idea of detecting Java level
deadlocks.

As for hangs caused by Ignite internal problems, do we have a ticket for
this as well? Do you have any idea about how this should be implemented?

-Val

On Mon, Jul 24, 2017 at 3:55 AM, Yakov Zhdanov  wrote:

> Val, it seems you spotted and issue. Please file a ticket - I would suggest
> to remove the exceptions entirely as in my understanding timeout logic for
> atomic operation will bring additional overhead, but most of the time
> atomic operations are instant. From timeout perspective, what differs
> atomic operation from a transaction is that you cannot predict when user
> releases lock he acquired inside a transaction, but atomic operation should
> have predictable timeout.
>
> As far as your example. Currently, this will lead to java-level deadlock on
> synchronized sections for the cache entries (but when we move to pure
> thread-per-partition for atomic caches this will not be an issue any more
> https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we
> file
> a ticket to implement detection of java-level deadlock and allow user to
> configure policy to take appropriate action on deadlock wherever it happens
> - https://issues.apache.org/jira/browse/IGNITE-5811
>
> Any other hang of the atomic operation seem to be caused by issues in
> Ignite's internal machinery - either hanged exchange or problems in message
> processing on some node (e.g. all threads are busy and/or in deadlock)
> which again should result in notifying user and stopping node (by default).
>
> --Yakov
>


Re: Timeouts in atomic cache

2017-07-24 Thread Yakov Zhdanov
Val, it seems you spotted and issue. Please file a ticket - I would suggest
to remove the exceptions entirely as in my understanding timeout logic for
atomic operation will bring additional overhead, but most of the time
atomic operations are instant. From timeout perspective, what differs
atomic operation from a transaction is that you cannot predict when user
releases lock he acquired inside a transaction, but atomic operation should
have predictable timeout.

As far as your example. Currently, this will lead to java-level deadlock on
synchronized sections for the cache entries (but when we move to pure
thread-per-partition for atomic caches this will not be an issue any more
https://issues.apache.org/jira/browse/IGNITE-4506). I would suggest we file
a ticket to implement detection of java-level deadlock and allow user to
configure policy to take appropriate action on deadlock wherever it happens
- https://issues.apache.org/jira/browse/IGNITE-5811

Any other hang of the atomic operation seem to be caused by issues in
Ignite's internal machinery - either hanged exchange or problems in message
processing on some node (e.g. all threads are busy and/or in deadlock)
which again should result in notifying user and stopping node (by default).

--Yakov


Re: Timeouts in atomic cache

2017-07-21 Thread Valentin Kulichenko
Any thoughts?

-Val

On Wed, Jul 19, 2017 at 4:21 PM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Folks,
>
> Do we currently have any way to set a timeout for an atomic operation? I
> don't see neither a way to do this nor any related documentation.
>
> In the code there are CacheAtomicUpdateTimeoutException and
> CacheAtomicUpdateTimeoutCheckedException, but I can't find a single place
> where it's created and/or thrown. Looks like we used to have this
> functionality, but it's not there anymore. Is this really the case or I
> missed something?
>
> I think having a way to timeout atomic operation is very important. For
> example, two concurrent putAll operations with keys in different order can
> completely hang the whole cluster forever, which is unacceptable. Is it
> possible to timeout one of the operations (or both of them) in this case?
>
> -Val
>