RE: Question about client disco-/reconnect behaviour

Stanislav Lukyanov Tue, 20 Mar 2018 02:51:53 -0700

I believe the reason behind the current behavior is that when a client’s 
“gateway” server goes down there is no guarantee that the client will receive 
all the events that it’s subscribed to (cluster events, messages, cache updates 
for continuous queries) – which means that it needs to be notified that it was 
disconnected (i.e. might miss some events) and reconnected (i.e. need to 
reestablish listeners, etc).


There are cases when Ignite will reconnect client “quietly”, without connection 
status change. I think one example is if the TCP connection was forcibly closed 
but client and server are able to just create new sockets. In that case Ignite 
is able to guarantee that all events are delivered and the complete 
reconnection ceremony is not needed. 
One could find more cases when reconnection events are currently fired but may 
be avoided, but I’d say it’s not as helpful as it sounds – (almost) always 
there are ways for your client to get disconnected and reconnected after a 
while, so you have to be aware of it and have the necessary code to handle it.

Stan

From: Bellenger, Dominique
Sent: 19 марта 2018 г. 19:56
To: [email protected]
Subject: AW: Question about client disco-/reconnect behaviour

Hey Stan,
alright, then my cluster is behaving like it should. Anyways: wouldn’t it be 
nice to transition to another server node transparently when the “connected” 
server node goes away? Would that be feasible? One could spare a lot of code 
messing around with connection status (“Client disconnected” exception 
handling) / disconnect reaction / reconnect logic if the client stood connected 
to the cluster as long as it is connected to at least one server.

Thank you for helping me out,
Dome

PS: I didn’t have a closer look at the logs for that specific thing, so maybe 
you know it from heart: how do I find out to which server the client is 
connected at the very moment?

Von: Stanislav Lukyanov <[email protected]> 
Gesendet: 19 March 2018 17:00
An: [email protected]
Betreff: RE: Question about client disco-/reconnect behaviour

Oh, sure, I guess I need to be more precise.

>From the “clustering” point of view (i.e. considering long-term connections 
>that hold the cluster together) the client has a single server that it is 
>connected to.
The clustering part is handled by the Discovery SPI subsystem (and its default 
implementation TcpDiscoverySpi).

During the cluster lifetime there are peer-to-peer connections created between 
all nodes, clients and servers, which will transfer most of the data (all cache 
operations, etc).
These connections are created and closed as needed. This is handled by the 
Communication SPI subsystem (and its default implementation 
TcpCommunicationSpi).

What you see in the logs below is the client connecting to both servers to 
perform cache or other operations (note the TcpCommunicationSpi class), but 
only one server is responsible for making sure the client is connected and 
alive.

About the documentation: I believe we don’t have this explained in detail yet, 
but some work is going on to improve the docs on networking in Ignite a bit – 
stay tuned.

Thanks,
Stan

From: Bellenger, Dominique
Sent: 19 марта 2018 г. 18:21
To: [email protected]
Subject: AW: Question about client disco-/reconnect behaviour

Hey Stan,
thank you for your answer. Does there exist some form of documentation about 
that behaviour (besides the code itself of course 😊 )

My observations show that the client is indeed connecting to both servers. When 
both servers are running and I start the client I get the following output from 
the server logs:

First try
Server1:
15:15:00.0941|DEBUG|TcpCommunicationSpi|Accepted new client connection: 
/127.0.0.1:54587
15:15:00.0941|INFO    |TcpCommunicationSpi|Accepted incoming communication 
connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:54587]
15:15:00.1097|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted 
session: GridSelectorNioSessionImpl [ ... ]
15:15:00.1097|DEBUG|TcpCommunicationSpi|Remote node ID received: 
41d75cf0-a42e-48e2-b23f-61f1284bd189
15:15:00.1097|DEBUG|TcpCommunicationSpi|Received handshake message 
[locNodeId=584e9592-b423-4da4-a37d-f65e080473cf, 
rmtNodeId=41d75cf0-a42e-48e2-b23f-61f1284bd189, msg=HandshakeMessage2 
[connIdx=0]]

Server2:
15:15:02.1256|DEBUG|TcpCommunicationSpi|Accepted new client connection: 
/127.0.0.1:54588
15:15:02.1256|INFO    |TcpCommunicationSpi|Accepted incoming communication 
connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:54588]
15:15:02.1722|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted 
session: GridSelectorNioSessionImpl [ ... ]
15:15:02.1892|DEBUG|TcpCommunicationSpi|Remote node ID received: 
41d75cf0-a42e-48e2-b23f-61f1284bd189
15:15:02.1892|DEBUG|TcpCommunicationSpi|Received handshake message 
[locNodeId=2c190af0-113f-4a12-90a2-757b5ab89220, 
rmtNodeId=41d75cf0-a42e-48e2-b23f-61f1284bd189, msg=HandshakeMessage2 
[connIdx=0]]

Second try
Server1:
16:12:18.1313|DEBUG|TcpCommunicationSpi|Accepted new client connection: 
/0:0:0:0:0:0:0:1:60600
16:12:18.1313|INFO |TcpCommunicationSpi|Accepted incoming communication 
connection [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:60600]
16:12:18.1438|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted 
session: GridSelectorNioSessionImpl [ ... ]
16:12:18.2365|DEBUG|TcpCommunicationSpi|Remote node ID received: 
f0c43cea-709d-420d-8c88-420a7ac3998d
16:12:18.2551|DEBUG|TcpCommunicationSpi|Received handshake message 
[locNodeId=465739c3-1c7c-4cb3-812b-de0c05315304, 
rmtNodeId=f0c43cea-709d-420d-8c88-420a7ac3998d, msg=HandshakeMessage2 
[connIdx=0]]

Server2:
16:12:17.8776|INFO |TcpCommunicationSpi|Accepted incoming communication 
connection [locAddr=/0:0:0:0:0:0:0:1:47100, rmtAddr=/0:0:0:0:0:0:0:1:60599]
16:12:17.8776|DEBUG|TcpCommunicationSpi|Sending local node ID to newly accepted 
session: GridSelectorNioSessionImpl [ ... ]
16:12:17.8916|DEBUG|TcpCommunicationSpi|Remote node ID received: 
f0c43cea-709d-420d-8c88-420a7ac3998d
16:12:17.9397|DEBUG|TcpCommunicationSpi|Received handshake message 
[locNodeId=be1a9f68-2e46-4e9f-8397-9b25a066d9cc, 
rmtNodeId=f0c43cea-709d-420d-8c88-420a7ac3998d, msg=HandshakeMessage2 
[connIdx=0]]

Note that using the exact same configuration the communication is established 
using IPv4 the first time and using IPv6 the second time.

Dome

Von: Stanislav Lukyanov <[email protected]> 
Gesendet: 19 March 2018 13:52
An: [email protected]
Betreff: RE: Question about client disco-/reconnect behaviour

Hi,

Yes, that’s the expected behavior.
The client is connected to a single server, not both. If the server it’s 
connected to is killed, the client will reconnect, producing the events in the 
process.
When you kill one of the servers and the client doesn’t get disconnected it’s 
probably because you’ve killed the wrong one (not the one client is connected 
to).

Stan

From: Bellenger, Dominique
Sent: 19 марта 2018 г. 15:32
To: [email protected]
Subject: Question about client disco-/reconnect behaviour

Hello igniters,
I have a question about expected client reconnection behaviour.
I have two server nodes and one client node. When everything is connected and 
one of the servers fails (because it is killed) the client is supposed to

1) Connect to the remaining server transparently, no Disconnect event, no 
reconnect event
2) Do nothing because it is connected to both servers and switches to the 
remaining connection silently. No Disconnect event, no Reconnect event.
3) Raise a disconnect event, connect to the remaining server, raise a reconnect 
event

I observe the following behaviour and just wanted to know, if that is the 
expected one using Apache Ignite .NET 2.3 (also matches 2.4).

If everything started I kill one of the server nodes and sometimes the client 
is disconnected and reconnects to the remaining server after a while. It does, 
however, not occur every time I kill one of the servers. In most cases I am 
successful forcing a reconnect if I do the following:
- Start one server and the client (order doesn’t matter)
- Start the second server and wait until everything is connected
- Kill the first server
- Client gets disconnected / reconnected
- 
So: is that the desired behaviour?

Thanks in advance,
Dome

RE: Question about client disco-/reconnect behaviour

Reply via email to