Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-12-07 Thread userx
Hi Pavel I am encountering the same issue in which it seems like the Server has entered into an infinite loop and every 10 seconds i am seeing the following message. 2018-12-06 15:49:23,188 WARN [exchange-worker-#122%5b9b0820-ec94-493c-ae58-bc31aac873c6%] {} org.apache.ignite.internal.processors

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-08-02 Thread Pavel Kovalenko
Hello Ray, I'm glad that your problem was resolved. I just want to add that on PME beginning phase we're waiting for all current client operations finishing, new operations are freezed till PME end. After node finishes all ongoing client operations it counts down latch that you see in logs which i

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-08-02 Thread Ray
The root cause for this issue is the network throttle between client and servers. When I move the clients to run in the same cluster as the servers, there's no such problem any more. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-30 Thread Ray
Hello Pavel, The PME stuck again and here's the detailed log and thread dump. node1.zip node2.zip node3.zip

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-30 Thread Ray
Hello Pavel, I was able to reproduce this issue and I've attached the DEBUG log and thread dump for three nodes as you suggested. Archive.zip This time, there's no "no route to host" exception between server and client node

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-26 Thread Pavel Kovalenko
Hello Ray, Without explicit errors in the log, it's not so easy to guess what was that. Because I don't see any errors, it should be a recoverable failure (even taking a long time). If you have such option, could you please enable DEBUG log level for org.apache.ignite.internal.util.nio.GridTcpNioC

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-26 Thread Ray
Hello Pavel, Thanks for the explanation, it's been great help. Can you take a guess why PME has performed a long time due to communication issues between server nodes? >From the logs, the "no route to host" exception happened because server can't connect to client's ports. But I didn't see any l

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-26 Thread Pavel Kovalenko
Hello Ray, It's hard to say that the issue you mentioned is the cause of your problem. To determine it, it will be very good if you get thread dumps on next such network glitch both from server and client nodes (using jstack e.g.). I'm not aware of Ignite Spark DataFrames implementation features,

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-25 Thread Ray
Hello Pavel, Here's the log for for node ids = [429edc2b-eb14-414f-a978-9bfe35443c8c, 6783732c-9a13-466f-800a-ad4c8d9be3bf]. 6783732c-9a13-466f-800a-ad4c8d9be3bf.zip 429edc2b-eb14-414f-a978-9bf

Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-25 Thread Pavel Kovalenko
Hello Ray, According to your attached log, It seems that you have some network problems. Could you please also share logs from nodes with temporary ids = [429edc2b-eb14-414f-a978-9bfe35443c8c, 6783732c-9a13-466f-800a-ad4c8d9be3bf]. The root cause should be on those nodes. 2018-07-25 13:03 GMT+03:

"Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-25 Thread Ray
I have a three node Ignite 2.6 cluster setup with the following config.