Re: Deadlock on concurrent calls to getAll and invokeAll on cache with read-through

2019-12-13 Thread peter108418
That's good news, thanks for the update. 
We have a reasonable workaround in place already, but we'll definitely look
into it again when 2.8 is released :)





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Deadlock on concurrent calls to getAll and invokeAll on cache with read-through

2019-12-11 Thread Ilya Kasnacheev
Hello!

It seems to me the issue is fixed in our master branch and the upcoming
2.8, so maybe you should just wait for the nearest available RC and use
that if this scenario is important for you.

I did not bisect for a fixing commit but you can do that if you like.

Regards,
-- 
Ilya Kasnacheev


вт, 10 дек. 2019 г. в 15:29, peter108418 :

> Thank you for looking into the issue, good to hear you were able to
> reproduce
> it.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Deadlock on concurrent calls to getAll and invokeAll on cache with read-through

2019-12-10 Thread peter108418
Thank you for looking into the issue, good to hear you were able to reproduce
it.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Deadlock on concurrent calls to getAll and invokeAll on cache with read-through

2019-12-09 Thread Ilya Kasnacheev
Hello!

Unfortunately you seem to be right and this is an issue.

I have filed a ticket https://issues.apache.org/jira/browse/IGNITE-12425 to
track it.

Regards,
-- 
Ilya Kasnacheev


пн, 2 дек. 2019 г. в 16:33, peter108418 :

> Hi
>
> We have recently started to encounter what appears to be deadlocks on one
> of
> our new clusters. We believe it may be due to "data patterns" being
> slightly
> different and more dense than our other existing (working) production
> clusters. We have some workarounds, but we think this might be an issue
> with
> Ignite. Hopefully someone is able to narrow down the cause further? :)
>
> Firstly, I'll describe the issues that we are seeing, and how to reproduce.
> Then I'll try explain what we are trying to accomplish, maybe there is a
> better solution to our problem?
>
> *The problem:*
> We have encountered an odd deadlock issue when, on the same cache where
> read-through is enabled, concurrently calls are made to "getAll" and
> "invokeAll". We are sorting the keys similarly across both calls.
> Replacing one "side" with either multiple "get"s, or multiple "invoke"s,
> seems to fix the problem, but performance is worse.
>
> I have created a test case that can reproduce it. The test creates
> - 1 thread doing a getAll({1, 2}),
> - 2 threads doing an invokeAll({2, 3}) and an invokeAll({1, 3})
>
> These 3 threads are executed, and may or may not end up in a deadlock,
> usually the test case captures the deadlock state before 50 repetitions.
> Please see attached sample maven project to reproduce:
> https://drive.google.com/open?id=1GJ78dsulJ0XG-erNkN_vm3ordKr0nqS6
> Run with "mvn clean test"
>
> I have also posted the test code, (partial) log output and (partial)
> stacktrace below.
>
>
> *What we are trying to do:*
> I believe our use-case to be fairly "normal"? We use Ignite as a
> cache-layer, with a read-through adapter to a backing data-store. As data
> continuously enters the backing data-store, we have a service that keeps
> the
> Ignite cache up-to-date.
>
> We have a large amount of historical data, going years back. The backing
> data-store is the "master", we are not using Ignite Persistence. We use
> Ignite as a cache layer as typically we recalculate on the same data
> multiple times. We key our data by time chunks, where the "value" is a
> container/collection of records within the time-range defined by the key.
> We decided to go with an IgniteCache with read-through enabled to automate
> cache-loading. To reduce the number of queries against the data-store, we
> usually call "getAll" on the cache, as the resulting set of keys provided
> to
> the CacheStore.loadAll can often be merged into a smaller number of queries
> (example: joining time-ranges "08:00:00 - 08:15:00" and "08:15:00 -
> 08:30:00" to larger single time-range "08:00:00 - 08:30:00").
>
> As we continuously load new data into the backing data-store, entries in
> Ignite become inconsistent with the data-store, especially those around
> "now" but out-of-order records also occur.
> To handle this, we have a separate Ignite Service that fetches new records
> from the data-store and updates the Ignite Cache using invokeAll and an
> entry-processor.
> Our reasoning here is to only forward the "new" records (in the scale of
> 10s
> of records) and merged them into the container (containing 1000s of
> records)
> "locally", instead of "getting" the container, merging and then "putting"
> which would transfer a large amount of data back and forth.
>
>
>
>
>
> Relevant log fragment:
>
>
> Dump of relevant threads:
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Deadlock on concurrent calls to getAll and invokeAll on cache with read-through

2019-12-02 Thread peter108418
Hi

We have recently started to encounter what appears to be deadlocks on one of
our new clusters. We believe it may be due to "data patterns" being slightly
different and more dense than our other existing (working) production
clusters. We have some workarounds, but we think this might be an issue with
Ignite. Hopefully someone is able to narrow down the cause further? :)

Firstly, I'll describe the issues that we are seeing, and how to reproduce.
Then I'll try explain what we are trying to accomplish, maybe there is a
better solution to our problem?

*The problem:*
We have encountered an odd deadlock issue when, on the same cache where
read-through is enabled, concurrently calls are made to "getAll" and
"invokeAll". We are sorting the keys similarly across both calls.
Replacing one "side" with either multiple "get"s, or multiple "invoke"s,
seems to fix the problem, but performance is worse.

I have created a test case that can reproduce it. The test creates 
- 1 thread doing a getAll({1, 2}), 
- 2 threads doing an invokeAll({2, 3}) and an invokeAll({1, 3})

These 3 threads are executed, and may or may not end up in a deadlock,
usually the test case captures the deadlock state before 50 repetitions.
Please see attached sample maven project to reproduce:
https://drive.google.com/open?id=1GJ78dsulJ0XG-erNkN_vm3ordKr0nqS6
Run with "mvn clean test"

I have also posted the test code, (partial) log output and (partial)
stacktrace below.


*What we are trying to do:*
I believe our use-case to be fairly "normal"? We use Ignite as a
cache-layer, with a read-through adapter to a backing data-store. As data
continuously enters the backing data-store, we have a service that keeps the
Ignite cache up-to-date.

We have a large amount of historical data, going years back. The backing
data-store is the "master", we are not using Ignite Persistence. We use
Ignite as a cache layer as typically we recalculate on the same data
multiple times. We key our data by time chunks, where the "value" is a
container/collection of records within the time-range defined by the key.
We decided to go with an IgniteCache with read-through enabled to automate
cache-loading. To reduce the number of queries against the data-store, we
usually call "getAll" on the cache, as the resulting set of keys provided to
the CacheStore.loadAll can often be merged into a smaller number of queries
(example: joining time-ranges "08:00:00 - 08:15:00" and "08:15:00 -
08:30:00" to larger single time-range "08:00:00 - 08:30:00").

As we continuously load new data into the backing data-store, entries in
Ignite become inconsistent with the data-store, especially those around
"now" but out-of-order records also occur.
To handle this, we have a separate Ignite Service that fetches new records
from the data-store and updates the Ignite Cache using invokeAll and an
entry-processor.
Our reasoning here is to only forward the "new" records (in the scale of 10s
of records) and merged them into the container (containing 1000s of records)
"locally", instead of "getting" the container, merging and then "putting"
which would transfer a large amount of data back and forth.





Relevant log fragment:


Dump of relevant threads:





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/