[jira] [Updated] (IGNITE-20076) Improve networking shutdown implementation

2023-08-01 Thread Vyacheslav Koptilin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vyacheslav Koptilin updated IGNITE-20076:
-
Labels: ignite-3  (was: igntie-3)

> Improve networking shutdown implementation
> --
>
> Key: IGNITE-20076
> URL: https://issues.apache.org/jira/browse/IGNITE-20076
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, when initiating an Ignite's node shutdown, we first stop 
> ScaleCube's cluster (so that it sends a LEAVING message) and only when it's 
> completely shutdown do we shut the connection manager. As a result, there is 
> some interval when the node's networking thinks it's still alive (and hence 
> it tries to restore connections with other nodes), but other nodes think the 
> node has already left (as they received that LEAVING message from it), so 
> they don't let it establish connections. The first node sees that it is 
> rejected and tries to handle this is a critical failure. Currently, it just 
> logs a scary message, but, when we implement a proper failure handler, this 
> will kill the node. This is not ok for a graceful stop scenario.
> The idea is to first (before stopping the ScaleCube local cluster) tell the 
> connection manager that it is now in the 'stopping' state. In this state, it 
> does not try to establish new connections (and does not attempt to reconnect) 
> and does not allow any incoming connections; also, it does not handle 
> rejections by other nodes as critical failures in this state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20076) Improve networking shutdown implementation

2023-07-28 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-20076:
---
Description: 
Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's 
cluster (so that it sends a LEAVING message) and only when it's completely 
shutdown do we shut the connection manager. As a result, there is some interval 
when the node's networking thinks it's still alive (and hence it tries to 
restore connections with other nodes), but other nodes think the node has 
already left (as they received that LEAVING message from it), so they don't let 
it establish connections. The first node sees that it is rejected and tries to 
handle this is a critical failure. Currently, it just logs a scary message, 
but, when we implement a proper failure handler, this will kill the node. This 
is not ok for a graceful stop scenario.

The idea is to first (before stopping the ScaleCube local cluster) tell the 
connection manager that it is now in the 'stopping' state. In this state, it 
does not try to establish new connections (and does not attempt to reconnect) 
and does not allow any incoming connections; also, it does not handle 
rejections by other nodes as critical failures in this state.

  was:
Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's 
cluster (so that it sends a LEAVING message) and only when it's completely 
shutdown do we shut the connection manager. As a result, there is some interval 
when the node's networking thinks it's still alive (and hence it tries to 
restore connections with other nodes), but other nodes think the node has 
already left (as they received that LEAVING message from it), so they don't let 
it establish connections. The first node sees that it is rejected and tries to 
handle this is a critical failure. Currently, it just logs a scary message, 
but, when we implement a proper failure handler, this will kill the node. This 
is not ok for a graceful stop scenario.

The idea is to first (before stopping the ScaleCube local cluster) is tell the 
connection manager that it is now in the 'stopping' state. In this state, it 
does not try to establish new connections (and does not attempt to reconnect) 
and does not allow any incoming connections; also, it does not handle 
rejections by other nodes as critical failures in this state.


> Improve networking shutdown implementation
> --
>
> Key: IGNITE-20076
> URL: https://issues.apache.org/jira/browse/IGNITE-20076
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: igntie-3
>
> Currently, when initiating an Ignite's node shutdown, we first stop 
> ScaleCube's cluster (so that it sends a LEAVING message) and only when it's 
> completely shutdown do we shut the connection manager. As a result, there is 
> some interval when the node's networking thinks it's still alive (and hence 
> it tries to restore connections with other nodes), but other nodes think the 
> node has already left (as they received that LEAVING message from it), so 
> they don't let it establish connections. The first node sees that it is 
> rejected and tries to handle this is a critical failure. Currently, it just 
> logs a scary message, but, when we implement a proper failure handler, this 
> will kill the node. This is not ok for a graceful stop scenario.
> The idea is to first (before stopping the ScaleCube local cluster) tell the 
> connection manager that it is now in the 'stopping' state. In this state, it 
> does not try to establish new connections (and does not attempt to reconnect) 
> and does not allow any incoming connections; also, it does not handle 
> rejections by other nodes as critical failures in this state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (IGNITE-20076) Improve networking shutdown implementation

2023-07-28 Thread Roman Puchkovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-20076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-20076:
---
Description: 
Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's 
cluster (so that it sends a LEAVING message) and only when it's completely 
shutdown do we shut the connection manager. As a result, there is some interval 
when the node's networking thinks it's still alive (and hence it tries to 
restore connections with other nodes), but other nodes think the node has 
already left (as they received that LEAVING message from it), so they don't let 
it establish connections. The first node sees that it is rejected and tries to 
handle this is a critical failure. Currently, it just logs a scary message, 
but, when we implement a proper failure handler, this will kill the node. This 
is not ok for a graceful stop scenario.

The idea is to first (before stopping the ScaleCube local cluster) is tell the 
connection manager that it is now in the 'stopping' state. In this state, it 
does not try to establish new connections (and does not attempt to reconnect) 
and does not allow any incoming connections; also, it does not handle 
rejections by other nodes as critical failures in this state.

  was:
Currently, when initiating an Ignite's node shutdown, we first stop ScaleCube's 
cluster (so that it sends a LEAVING message) and only when it's completely 
shutdown do we shut the connection manager. As a result, there is some interval 
when the node's networking thinks it's still alive (and hence it tries to 
restore connections with other nodes), but other nodes think the node has 
already left (as they received that LEAVING message from it), so they don't let 
it establish connections. The first node sees that it is rejected and tries to 
handle this is a critical failure. Currently, it just logs a scary message, 
but, when we implement a proper failure handler, this will kill the node. This 
is not ok for a graceful stop scenario.
The idea is to first (before stopping the ScaleCube local cluster) is to tell 
the connection manager that it is now in the 'stopping' state. In this state, 
it does not try to establish new connections (and does not attempt to 
reconnect) and does not allow any incoming connections; also, it does not 
handle rejections by other nodes as critical failures in this state.


> Improve networking shutdown implementation
> --
>
> Key: IGNITE-20076
> URL: https://issues.apache.org/jira/browse/IGNITE-20076
> Project: Ignite
>  Issue Type: Bug
>Reporter: Roman Puchkovskiy
>Assignee: Roman Puchkovskiy
>Priority: Major
>  Labels: igntie-3
>
> Currently, when initiating an Ignite's node shutdown, we first stop 
> ScaleCube's cluster (so that it sends a LEAVING message) and only when it's 
> completely shutdown do we shut the connection manager. As a result, there is 
> some interval when the node's networking thinks it's still alive (and hence 
> it tries to restore connections with other nodes), but other nodes think the 
> node has already left (as they received that LEAVING message from it), so 
> they don't let it establish connections. The first node sees that it is 
> rejected and tries to handle this is a critical failure. Currently, it just 
> logs a scary message, but, when we implement a proper failure handler, this 
> will kill the node. This is not ok for a graceful stop scenario.
> The idea is to first (before stopping the ScaleCube local cluster) is tell 
> the connection manager that it is now in the 'stopping' state. In this state, 
> it does not try to establish new connections (and does not attempt to 
> reconnect) and does not allow any incoming connections; also, it does not 
> handle rejections by other nodes as critical failures in this state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)