Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

userx Fri, 01 May 2020 01:53:23 -0700

Hi Pavel,

I am using 2.8 and still getting the same issue. Here is the ecosystem


19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent
mode.

96 Clients (C1 to C96)

There are 19 machines, 1 Ignite server is started on 1 machine. The clients
are evenly distributed across machines.

C19 tries to create a cache, it gets a timeout exception as i have 5 mins of
timeout. When I looked into the coordinator logs, between a span of 5
minutes, it gets the messages 


2020-04-24 15:37:09,434 WARN [exchange-worker-#45%S1%] {}
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture
- Unable to await partitions release latch within timeout. Some nodes have
not sent acknowledgement for latch completion. It's possible due to
unfinishined atomic updates, transactions or not released explicit locks on
that nodes. Please check logs for errors on nodes with ids reported in latch
`pendingAcks` collection [latch=ServerLatch [permits=4, pendingAcks=HashSet
[84b8416c-fa06-4544-9ce0-e3dfba41038a, 19bd7744-0ced-4123-a35f-ddf0cf9f55c4,
533af8f9-c0f6-44b6-92d4-658f86ffaca0, 1b31cb25-abbc-4864-88a3-5a4df37a0cf4],
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=174, minorTopVer=1]]]]]

And the 4 nodes which have not been able to acknowledge latch completion are
S14, S7, S18, S4

I went to see the logs of S4, it just records the addition of C19 into
topology and then C19 leaving it after 5 minutes. The only thing is that in
GC I see this consistently "Total time for which application threads were
stopped: 0.0006225 seconds, Stopping threads took: 0.0000887 seconds"

I understand that until the time all the atomic updates and transactions are
finished Clients are not able to create caches by communicating with
Coordinator but is there a way around ?

So the question is that is it still prevalent on 2.8 ?









--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Cluster went down after "Unable to await partitions release latch within timeout" WARN

Reply via email to