I believe the reason behind the current behavior is that when a client’s “gateway” server goes down there is no guarantee that the client will receive all the events that it’s subscribed to (cluster events, messages, cache updates for continuous queries) – which means that it needs to be notified that it was disconnected (i.e. might miss some events) and reconnected (i.e. need to reestablish listeners, etc).
There are cases when Ignite will reconnect client “quietly”, without connection status change. I think one example is if the TCP connection was forcibly closed but client and server are able to just create new sockets. In that case Ignite is able to guarantee that all events are delivered and the complete reconnection ceremony is not needed. One could find more cases when reconnection events are currently fired but may be avoided, but I’d say it’s not as helpful as it sounds – (almost) always there are ways for your client to get disconnected and reconnected after a while, so you have to be aware of it and have the necessary code to handle it. Stan From: Bellenger, Dominique Sent: 19 марта 2018 г. 19:56 To: [email protected] Subject: AW: Question about client disco-/reconnect behaviour Hey Stan, alright, then my cluster is behaving like it should. Anyways: wouldn’t it be nice to transition to another server node transparently when the “connected” server node goes away? Would that be feasible? One could spare a lot of code messing around with connection status (“Client disconnected” exception handling) / disconnect reaction / reconnect logic if the client stood connected to the cluster as long as it is connected to at least one server. Thank you for helping me out, Dome PS: I didn’t have a closer look at the logs for that specific thing, so maybe you know it from heart: how do I find out to which server the client is connected at the very moment? Von: Stanislav Lukyanov <[email protected]> Gesendet: 19 March 2018 17:00 An: [email protected] Betreff: RE: Question about client disco-/reconnect behaviour Oh, sure, I guess I need to be more precise. >From the “clustering” point of view (i.e. considering long-term connections >that hold the cluster together) the client has a single server that it is >connected to. The clustering part is handled by the Discovery SPI subsystem (and its default implementation TcpDiscoverySpi). During the cluster lifetime there are peer-to-peer connections created between all nodes, clients and servers, which will transfer most of the data (all cache operations, etc). These connections are created and closed as needed. This is handled by the Communication SPI subsystem (and its default implementation TcpCommunicationSpi). What you see in the logs below is the client connecting to both servers to perform cache or other operations (note the TcpCommunicationSpi class), but only one server is responsible for making sure the client is connected and alive. About the documentation: I believe we don’t have this explained in detail yet, but some work is going on to improve the docs on networking in Ignite a bit – stay tuned. Thanks, Stan From: Bellenger, Dominique Sent: 19 марта 2018 г. 18:21 To: [email protected] Subject: AW: Question about client disco-/reconnect behaviour Hey Stan, thank you for your answer. Does there exist some form of documentation about that behaviour (besides the code itself of course 😊 ) My observations show that the client is indeed connecting to both servers. When both servers are running and I start the client I get the following output from the server logs: First try Server1: 15:15:00.0941|DEBUG|TcpCommunicationSpi|Accepted new client connection: /127.0.0.1:54587 15:15:00.0941|INFO |TcpCommunicationSpi|Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:54587] 15:15:00.1097|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted session: GridSelectorNioSessionImpl [ ... ] 15:15:00.1097|DEBUG|TcpCommunicationSpi|Remote node ID received: 41d75cf0-a42e-48e2-b23f-61f1284bd189 15:15:00.1097|DEBUG|TcpCommunicationSpi|Received handshake message [locNodeId=584e9592-b423-4da4-a37d-f65e080473cf, rmtNodeId=41d75cf0-a42e-48e2-b23f-61f1284bd189, msg=HandshakeMessage2 [connIdx=0]] Server2: 15:15:02.1256|DEBUG|TcpCommunicationSpi|Accepted new client connection: /127.0.0.1:54588 15:15:02.1256|INFO |TcpCommunicationSpi|Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:54588] 15:15:02.1722|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted session: GridSelectorNioSessionImpl [ ... ] 15:15:02.1892|DEBUG|TcpCommunicationSpi|Remote node ID received: 41d75cf0-a42e-48e2-b23f-61f1284bd189 15:15:02.1892|DEBUG|TcpCommunicationSpi|Received handshake message [locNodeId=2c190af0-113f-4a12-90a2-757b5ab89220, rmtNodeId=41d75cf0-a42e-48e2-b23f-61f1284bd189, msg=HandshakeMessage2 [connIdx=0]] Second try Server1: 16:12:18.1313|DEBUG|TcpCommunicationSpi|Accepted new client connection: /0:0:0:0:0:0:0:1:60600 16:12:18.1313|INFO |TcpCommunicationSpi|Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:60600] 16:12:18.1438|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted session: GridSelectorNioSessionImpl [ ... ] 16:12:18.2365|DEBUG|TcpCommunicationSpi|Remote node ID received: f0c43cea-709d-420d-8c88-420a7ac3998d 16:12:18.2551|DEBUG|TcpCommunicationSpi|Received handshake message [locNodeId=465739c3-1c7c-4cb3-812b-de0c05315304, rmtNodeId=f0c43cea-709d-420d-8c88-420a7ac3998d, msg=HandshakeMessage2 [connIdx=0]] Server2: 16:12:17.8776|INFO |TcpCommunicationSpi|Accepted incoming communication connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:60599] 16:12:17.8776|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted session: GridSelectorNioSessionImpl [ ... ] 16:12:17.8916|DEBUG|TcpCommunicationSpi|Remote node ID received: f0c43cea-709d-420d-8c88-420a7ac3998d 16:12:17.9397|DEBUG|TcpCommunicationSpi|Received handshake message [locNodeId=be1a9f68-2e46-4e9f-8397-9b25a066d9cc, rmtNodeId=f0c43cea-709d-420d-8c88-420a7ac3998d, msg=HandshakeMessage2 [connIdx=0]] Note that using the exact same configuration the communication is established using IPv4 the first time and using IPv6 the second time. Dome Von: Stanislav Lukyanov <[email protected]> Gesendet: 19 March 2018 13:52 An: [email protected] Betreff: RE: Question about client disco-/reconnect behaviour Hi, Yes, that’s the expected behavior. The client is connected to a single server, not both. If the server it’s connected to is killed, the client will reconnect, producing the events in the process. When you kill one of the servers and the client doesn’t get disconnected it’s probably because you’ve killed the wrong one (not the one client is connected to). Stan From: Bellenger, Dominique Sent: 19 марта 2018 г. 15:32 To: [email protected] Subject: Question about client disco-/reconnect behaviour Hello igniters, I have a question about expected client reconnection behaviour. I have two server nodes and one client node. When everything is connected and one of the servers fails (because it is killed) the client is supposed to 1) Connect to the remaining server transparently, no Disconnect event, no reconnect event 2) Do nothing because it is connected to both servers and switches to the remaining connection silently. No Disconnect event, no Reconnect event. 3) Raise a disconnect event, connect to the remaining server, raise a reconnect event I observe the following behaviour and just wanted to know, if that is the expected one using Apache Ignite .NET 2.3 (also matches 2.4). If everything started I kill one of the server nodes and sometimes the client is disconnected and reconnects to the remaining server after a while. It does, however, not occur every time I kill one of the servers. In most cases I am successful forcing a reconnect if I do the following: - Start one server and the client (order doesn’t matter) - Start the second server and wait until everything is connected - Kill the first server - Client gets disconnected / reconnected - So: is that the desired behaviour? Thanks in advance, Dome
