Hello, I don't clearly understand from your message, but have the exchange finally finished? Or you were getting this WARN message all the time?
пт, 1 мая 2020 г. в 12:32, Ilya Kasnacheev <ilya.kasnach...@gmail.com>: > Hello! > > This description sounds like a typical hanging Partition Map Exchange, but > you should be able to see that in logs. > If you don't, you can collect thread dumps from all nodes with jstack and > check it for any stalling operations (or share with us). > > Regards, > -- > Ilya Kasnacheev > > > пт, 1 мая 2020 г. в 11:53, userx <gagan...@gmail.com>: > >> Hi Pavel, >> >> I am using 2.8 and still getting the same issue. Here is the ecosystem >> >> 19 Ignite servers (S1 to S19) running at 16GB of max JVM and in persistent >> mode. >> >> 96 Clients (C1 to C96) >> >> There are 19 machines, 1 Ignite server is started on 1 machine. The >> clients >> are evenly distributed across machines. >> >> C19 tries to create a cache, it gets a timeout exception as i have 5 mins >> of >> timeout. When I looked into the coordinator logs, between a span of 5 >> minutes, it gets the messages >> >> >> 2020-04-24 15:37:09,434 WARN [exchange-worker-#45%S1%] {} >> >> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture >> - Unable to await partitions release latch within timeout. Some nodes have >> not sent acknowledgement for latch completion. It's possible due to >> unfinishined atomic updates, transactions or not released explicit locks >> on >> that nodes. Please check logs for errors on nodes with ids reported in >> latch >> `pendingAcks` collection [latch=ServerLatch [permits=4, >> pendingAcks=HashSet >> [84b8416c-fa06-4544-9ce0-e3dfba41038a, >> 19bd7744-0ced-4123-a35f-ddf0cf9f55c4, >> 533af8f9-c0f6-44b6-92d4-658f86ffaca0, >> 1b31cb25-abbc-4864-88a3-5a4df37a0cf4], >> super=CompletableLatch [id=CompletableLatchUid [id=exchange, >> topVer=AffinityTopologyVersion [topVer=174, minorTopVer=1]]]]] >> >> And the 4 nodes which have not been able to acknowledge latch completion >> are >> S14, S7, S18, S4 >> >> I went to see the logs of S4, it just records the addition of C19 into >> topology and then C19 leaving it after 5 minutes. The only thing is that >> in >> GC I see this consistently "Total time for which application threads were >> stopped: 0.0006225 seconds, Stopping threads took: 0.0000887 seconds" >> >> I understand that until the time all the atomic updates and transactions >> are >> finished Clients are not able to create caches by communicating with >> Coordinator but is there a way around ? >> >> So the question is that is it still prevalent on 2.8 ? >> >> >> >> >> >> >> >> >> >> -- >> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >> >