I see. It sounds like this would be a rough sequence of events for a happy
path successful case:
1. OuterEntryProcessor, running on sys-stripe-5 on server2,  invokes
InnerEntryProcessor
2. sys-stripe-5 on server2 waits for the above future
3. InnerEntryProcessor runs on sys-stripe-1 on server1
4. server1 communicates the result to server2
5. server2 receives the message and tells sys-stripe-1 on server2 to
process it
6. sys-stripe-1 on server2 completes the future that sys-stripe-5 on
server2 was waiting for.

This design is not problematic when stripe threads never execute anything
blocking, as you advised. Thank you for making that clear!

- Raymond


On Thu, Oct 31, 2024 at 9:11 AM Pavel Tupitsyn <ptupit...@apache.org> wrote:

> > both invocations use sys-stripe-6
>
> Ignite needs sys-stripe-6 to complete the future for the invocation, but
> the thread is blocked waiting on that future.
>
> On Thu, Oct 31, 2024 at 3:05 PM Raymond Liu <philosob...@gmail.com> wrote:
>
>> Each node has 32 sys-stripe threads. With 3 servers and 1 client, there
>> are a total of 128 sys-stripe threads. 127 out of 128 of those sys-stripe
>> threads are parked on StripeConcurrentQueue.take(). The remaining one,
>> sys-stripe-6 on server node 1, is parked on IgniteFutureImpl.get().
>> Specifically, that future.get is the OuterEntryProcessor on server1 for
>> outer key "outer1" waiting for the InnerEntryProcessor on server3 for inner
>> key "outer1-inner1" to complete. Logs show that particular inner invocation
>> completed on sys-stripe-6 on server3.
>>
>> Here is the thread state for sys-stripe-6 on server1:
>> Thread [name="sys-stripe-6-#7%server1%", id=40, state=WAITING,
>> blockCnt=29, waitCnt=6]
>>         at java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
>>         at java.base@17.0.11
>> /java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
>>         at
>> app//o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:181)
>>         at
>> app//o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:144)
>>         at
>> app//o.a.i.i.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:152)
>>         at
>> app//com.philosobyte.igniteblockedthreadsample.BlockedThreadTest$OuterEntryProcessor.process(BlockedThreadTest.java:105)
>>         at
>> app//com.philosobyte.igniteblockedthreadsample.BlockedThreadTest$OuterEntryProcessor.process(BlockedThreadTest.java:75)
>>         at
>> app//o.a.i.i.processors.cache.EntryProcessorResourceInjectorProxy.process(EntryProcessorResourceInjectorProxy.java:68)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.runEntryProcessor(GridCacheMapEntry.java:6334)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5650)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.call(GridCacheMapEntry.java:5437)
>>         at
>> app//o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4320)
>>         at
>> app//o.a.i.i.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:4214)
>>         at
>> app//o.a.i.i.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2292)
>>         at
>> app//o.a.i.i.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:2182)
>>         at
>> app//o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1692)
>>         at
>> app//o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1675)
>>         at
>> app//o.a.i.i.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:424)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:1903)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2536)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:1996)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1815)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1688)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3179)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$200(GridDhtAtomicCache.java:147)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$3.apply(GridDhtAtomicCache.java:270)
>>         at
>> app//o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$3.apply(GridDhtAtomicCache.java:265)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1164)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:605)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:406)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:324)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:112)
>>         at
>> app//o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:314)
>>         at
>> app//o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1906)
>>         at
>> app//o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1527)
>>         at
>> app//o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
>>         at
>> app//o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1420)
>>         at
>> app//o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>>         at
>> app//o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
>>         at app//o.a.i.i.util.worker.GridWorker.run(GridWorker.java:125)
>>         at java.base@17.0.11/java.lang.Thread.run(Thread.java:840)
>>
>>     Locked synchronizers:
>>
>> PoolProcessor printed a log showing the contents of the queue:
>> 2024-10-31T08:31:33.330-04:00  WARN 34704 ---
>> [ignite-blocked-thread-sample] [er-#70%server1%] o.a.i.i.p.p.PoolProcessor
>>                : >>> Possible starvation in striped pool.
>>     Thread name: sys-stripe-6-#7%server1%
>>     Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
>> topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>> msg=GridNearAtomicUpdateResponse [nodeId=null, futId=3, errs=null,
>> ret=GridCacheReturn [v=null, cacheObj=null, success=false, invokeRes=true,
>> loc=true, cacheId=100355670], remapTopVer=null, nearUpdates=null,
>> partId=294, mapping=ArrayList [], nodeLeft=false, super=GridCacheIdMessage
>> [cacheId=100355670, super=GridCacheMessage [msgId=42, depInfo=null,
>> lastAffChangedTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0],
>> err=null, skipPrepare=false]]]]]]
>>     Deadlock: false
>>     Completed: 0
>>
>>
>>
>> and here is the thread state for sys-stripe-6 on server 3, which finished
>> executing the inner entry processor. The other 126 sys-stripe thread states
>> look like this, as well:
>> Thread [name="sys-stripe-6-#300%server3%", id=343, state=WAITING,
>> blockCnt=17, waitCnt=2]
>>         at java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
>>         at java.base@17.0.11
>> /java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
>>         at
>> app//o.a.i.i.util.StripedExecutor$StripeConcurrentQueue.take(StripedExecutor.java:819)
>>         at
>> app//o.a.i.i.util.StripedExecutor$Stripe.body(StripedExecutor.java:623)
>>         at app//o.a.i.i.util.worker.GridWorker.run(GridWorker.java:125)
>>         at java.base@17.0.11/java.lang.Thread.run(Thread.java:840)
>>
>> This is where I stopped and wrote my first e-mail. I figured that, to
>> debug further, I should get some tips about how processor completions are
>> normally communicated across the network from the called server to the
>> calling server.
>>
>> It does appear to matter that both invocations use sys-stripe-6 (albeit
>> one on server1 and one on server3). The other cross-server invocations
>> complete OK, e.g. outer0-inner0 is invoked from sys-stripe-5 on server2 and
>> completes on sys-stripe-1 on server1.
>> I have attached text logs including the full thread dump, in case that is
>> interesting.
>>
>> - Raymond
>>
>> On Thu, Oct 31, 2024 at 2:11 AM Pavel Tupitsyn <ptupit...@apache.org>
>> wrote:
>>
>>> Have you tried to get a thread dump to understand which code is blocking
>>> the sys-stripe thread?
>>>
>>> On Thu, Oct 31, 2024 at 5:56 AM Raymond Liu <philosob...@gmail.com>
>>> wrote:
>>>
>>>> Hm, the scenario you described doesn't sound accurate here. I guess
>>>> it's my fault for not including logs. I will paste them below. The logs
>>>> show all four *inner* entry processors complete without being blocked
>>>> - we see four "Entered InnerEntryProcessor" and four "Exiting
>>>> InnerEntryProcessor" lines. And yet, one of the OuterEntryProcessors does
>>>> not seem to get notified that the inner entry processors have all
>>>> completed. This is indicated by the presence of "OuterEntryProcessor with
>>>> key outer1 waiting for inner future 1" without a corresponding log stating
>>>> "OuterEntryProcessor with key outer1 done waiting for all inner futures".
>>>> -------
>>>> 2024-10-30T23:27:55.056-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Starting test
>>>> 2024-10-30T23:27:55.057-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Invoking OuterEntryProcessor with key outer0
>>>> 2024-10-30T23:27:55.077-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [mmunicationSpi%]
>>>> o.a.i.s.c.t.TcpCommunicationSpi          : Accepted incoming communication
>>>> connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:61560]
>>>> 2024-10-30T23:27:55.077-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [mmunicationSpi%]
>>>> o.a.i.s.c.t.TcpCommunicationSpi          : Established outgoing
>>>> communication connection [locAddr=/127.0.0.1:61560, rmtAddr=/
>>>> 127.0.0.1:47101]
>>>> 2024-10-30T23:27:55.081-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Invoked OuterEntryProcessor with key outer0
>>>> 2024-10-30T23:27:55.081-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Invoking OuterEntryProcessor with key outer1
>>>> 2024-10-30T23:27:55.082-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Invoked OuterEntryProcessor with key outer1
>>>> 2024-10-30T23:27:55.082-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Done invoking OuterEntryProcessors. Waiting for them to
>>>> complete...
>>>> 2024-10-30T23:27:55.082-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Test Runner waiting for outer future 0
>>>> 2024-10-30T23:27:55.093-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : Entered OuterEntryProcessor with key outer1. Sleeping
>>>> before further actions...
>>>> 2024-10-30T23:27:55.093-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Entered OuterEntryProcessor with key outer0. Sleeping
>>>> before further actions...
>>>> 2024-10-30T23:27:55.195-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : Invoking with inner key: outer1-inner0
>>>> 2024-10-30T23:27:55.195-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Invoking with inner key: outer0-inner0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : Invoked with inner key: outer1-inner0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Invoked with inner key: outer0-inner0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : Invoking with inner key: outer1-inner1
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Invoking with inner key: outer0-inner1
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Invoked with inner key: outer0-inner1
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : Invoked with inner key: outer1-inner1
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : OuterEntryProcessor with key outer1 waiting for inner
>>>> future 0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : OuterEntryProcessor with key outer0 waiting for inner
>>>> future 0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-1-#2%server1%] c.p.i.BlockedThreadTest
>>>>                : Entered InnerEntryProcessor with inner key: outer0-inner0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-1-#2%server1%] c.p.i.BlockedThreadTest
>>>>                : Exiting InnerEntryProcessor with inner key: outer0-inner0
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-0-#1%server1%] c.p.i.BlockedThreadTest
>>>>                : Entered InnerEntryProcessor with inner key: outer0-inner1
>>>> 2024-10-30T23:27:55.209-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-0-#1%server1%] c.p.i.BlockedThreadTest
>>>>                : Exiting InnerEntryProcessor with inner key: outer0-inner1
>>>> 2024-10-30T23:27:55.210-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [6-#298%server3%] c.p.i.BlockedThreadTest
>>>>                : Entered InnerEntryProcessor with inner key: outer1-inner1
>>>> 2024-10-30T23:27:55.210-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [6-#298%server3%] c.p.i.BlockedThreadTest
>>>>                : Exiting InnerEntryProcessor with inner key: outer1-inner1
>>>> 2024-10-30T23:27:55.215-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : OuterEntryProcessor with key outer0 waiting for inner
>>>> future 1
>>>> 2024-10-30T23:27:55.216-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : OuterEntryProcessor with key outer0 done waiting for all
>>>> inner futures
>>>> 2024-10-30T23:27:55.216-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Entered InnerEntryProcessor with inner key: outer1-inner0
>>>> 2024-10-30T23:27:55.216-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [5-#142%server2%] c.p.i.BlockedThreadTest
>>>>                : Exiting InnerEntryProcessor with inner key: outer1-inner0
>>>> 2024-10-30T23:27:55.216-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [    Test worker] c.p.i.BlockedThreadTest
>>>>                : Test Runner waiting for outer future 1
>>>> 2024-10-30T23:27:55.217-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [e-6-#7%server1%] c.p.i.BlockedThreadTest
>>>>                : OuterEntryProcessor with key outer1 waiting for inner
>>>> future 1
>>>> 2024-10-30T23:27:59.621-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [-notifier-timer]
>>>> o.a.i.i.p.c.GridUpdateNotifier           : Your version is up to date.
>>>> 2024-10-30T23:28:01.283-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [-notifier-timer]
>>>> o.a.i.i.p.c.GridUpdateNotifier           : Your version is up to date.
>>>> 2024-10-30T23:28:02.877-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [-notifier-timer]
>>>> o.a.i.i.p.c.GridUpdateNotifier           : Your version is up to date.
>>>> 2024-10-30T23:28:04.399-04:00  INFO 17056 ---
>>>> [ignite-blocked-thread-sample] [-notifier-timer]
>>>> o.a.i.i.p.c.GridUpdateNotifier           : Your version is up to date.
>>>> 2024-10-30T23:28:08.508-04:00 ERROR 17056 ---
>>>> [ignite-blocked-thread-sample] [%-#118%server1%] o.a.i.i.u.t.G
>>>>                : Blocked system-critical thread has been detected. This can
>>>> lead to cluster-wide undefined behaviour [workerName=sys-stripe-6,
>>>> threadName=sys-stripe-6-#7%server1%, blockedFor=13s]
>>>> -------
>>>>
>>>> Thanks for the advice - we are probably using entry processors in a way
>>>> originally intended for compute jobs with transactions, and we will
>>>> probably move towards that direction eventually. But to learn from this and
>>>> prevent future issues, I'd like to understand exactly what the problem is,
>>>> and with these logs, I can't understand how it's a deadlock.
>>>>
>>>> - Raymond
>>>>
>>>>
>>>> On Wed, Oct 30, 2024 at 7:33 AM Pavel Tupitsyn <ptupit...@apache.org>
>>>> wrote:
>>>>
>>>>> Looks like a deadlock to me.
>>>>>
>>>>> Processor 1 invokes processor 2 and blocks on a future, stripe thread
>>>>> T is blocked
>>>>> Processor 2 invokes processor 3, which needs stripe thread T and waits
>>>>> forever
>>>>>
>>>>> General advice - do not block striped pool threads. Avoid IO, waiting
>>>>> for futures, etc. An entry processor is supposed to perform a local
>>>>> computation and return quickly.
>>>>>
>>>>> On Wed, Oct 30, 2024 at 7:05 AM Raymond Liu <philosob...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>> We see a failure with nested asynchronous entry processor calls where 
>>>>>> the nested calls appear to complete, but
>>>>>> the parent entry processor is not aware the child entry processor is 
>>>>>> done.
>>>>>>
>>>>>> In this sample test setup...
>>>>>> 1. a test runner invokes an "outer" entry processor twice asynchronously.
>>>>>> 2. The two outer entry processor invocations each invoke two inner entry 
>>>>>> processors asynchronously.
>>>>>> 3. All four inner entry processor invocations appear to complete.
>>>>>> 4. One outer entry processor is stuck waiting (via IgniteFuture.get()) 
>>>>>> for the last inner entry processor to complete.
>>>>>>    The program hangs, prints thread dumps, and prints the following 
>>>>>> error:
>>>>>>    "Blocked system-critical thread has been detected. This can lead to 
>>>>>> cluster-wide undefined behaviour [workerName=sys-stripe-6, 
>>>>>> threadName=sys-stripe-6-#7%server1%, blockedFor=13s]"
>>>>>>
>>>>>> I've uploaded this in a Github repo called 
>>>>>> "ignite-blocked-thread-sample": 
>>>>>> https://github.com/Philosobyte/ignite-blocked-thread-sample/blob/main/src/test/java/com/philosobyte/igniteblockedthreadsample/BlockedThreadTest.java
>>>>>>
>>>>>> While this sample uses asynchronous invocations, a version of this test 
>>>>>> with synchronous invocations runs into the same
>>>>>> issue.
>>>>>>
>>>>>> This appears to happen only when there is more than one server node and 
>>>>>> the nested invocation goes to a different
>>>>>> server node than the outer invocation. A debug breakpoint shows that, on 
>>>>>> the server executing the inner entry processor,
>>>>>> the sys-stripe thread pool is idle; but on the server executing the 
>>>>>> outer entry processor, there is one sys-stripe
>>>>>> thread pool waiting on IgniteFuture.get().
>>>>>>
>>>>>> We originally suspected a deadlock, but this error also occurs when we 
>>>>>> try setting thread pool sizes to a large number.
>>>>>> This almost appears to be a failure to communicate that a future should 
>>>>>> be complete.
>>>>>>
>>>>>> Nested compute tasks, such as `IgniteCallable`s and `IgniteClosure`s 
>>>>>> executing on the public thread pool, do not run
>>>>>> into this problem. So, we worked around this issue by wrapping our entry 
>>>>>> processors inside `IgniteCallable`s. That alone
>>>>>> was already enough to resolve the issue; of course, at higher scale, 
>>>>>> we'd run into deadlocks, so we also followed the
>>>>>> advice in the "thread pools tuning" documentation page about using a 
>>>>>> custom thread pool.
>>>>>>
>>>>>> Regardless, we would like to make sure we understand the issue. Can 
>>>>>> anyone think of an explanation? Is this expected
>>>>>> behavior or possibly a bug?
>>>>>>
>>>>>> Thank you,
>>>>>> Raymond
>>>>>>
>>>>>>

Reply via email to