[ 
https://issues.apache.org/jira/browse/GEODE-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anilkumar Gingade updated GEODE-2064:
-------------------------------------
    Description: 
While message send in progress, if the system gets shutdown (forced 
disconnect), the send (message delivery to peers) reports connect exception and 
ignores detecting/throwing SystemDisconnect exception. 

In "DirectChannel.getConnection()" it checks for "mgr.shutdownInProgress()" and 
returns ConnectException to the caller 
"GMSMembershipManager.directChannelSend()"

In client/server scenario, if the client is performing cache operation, the 
cache operation may succeed in server that is getting down and failure to 
deliver the message to other peers/servers. The client will see the operation 
getting successfully completed.  

The above scenario could result in client missing an event and resulting in 
data mismatch between client and server.

Consider scenario:
-- 2 servers, server1 and server2 in distributed system.
-- client1 has subscribed for events, with its primary queue on server1 and 
secondary queue on server2.
-- client2 does an put with Key1 on server1. 
-- the server1 sends the event to client1 (client1 now has key1) and while 
sending the message to server 2, it gets disconnected. While shutdown in 
progress, its unable to deliver the message to server2 and instead of throwing 
the exception to client, it responds to client with success. The client1 now 
fails-over to server1;  its out of sync with server1 cache (which doesn't has 
key1). 

Solution: By throwing the disconnect exception back to the caller. This enables 
the caller (client) to retry the operation on the live server.


  was:
While message send in progress, if the system gets shutdown (forced 
disconnect), the send (message delivery to peers) reports connect exception and 
ignores detecting/throwing SystemDisconnect exception. 

In "DirectChannel.getConnection()" it checks for "mgr.shutdownInProgress()" and 
returns ConnectException to the caller 
"GMSMembershipManager.directChannelSend()"

In client/server scenario, if the client is performing cache operation, the 
cache operation may succeed in server that is getting down and failure to 
deliver the message to other peers/servers. The client will see the operation 
getting successfully completed.  

The above scenario could result in client missing an event and resulting in 
data mismatch between client and server.

Consider scenario:
-- 2 servers, server1 and server2 in distributed system.
-- client1 has subscribed for events, with its primary queue on server1 and 
secondary queue on server2.
- client2 does an put with Key1 on server1. 
- the server1 sends the event to client1 (client1 now has key1) and while 
sending the message to server 2, it gets disconnected. While shutdown in 
progress, its unable to deliver the message to server2 and instead of throwing 
the exception to client, it responds to client with success. The client1 now 
fails-over to server1;  its out of sync with server1 cache (which doesn't has 
key1). 

Solution: By throwing the disconnect exception back to the caller. This enables 
the caller (client) to retry the operation on the live server.



> Unable to detect system shutdown during message delivery.
> ---------------------------------------------------------
>
>                 Key: GEODE-2064
>                 URL: https://issues.apache.org/jira/browse/GEODE-2064
>             Project: Geode
>          Issue Type: Bug
>          Components: messaging
>            Reporter: Anilkumar Gingade
>            Assignee: Anilkumar Gingade
>
> While message send in progress, if the system gets shutdown (forced 
> disconnect), the send (message delivery to peers) reports connect exception 
> and ignores detecting/throwing SystemDisconnect exception. 
> In "DirectChannel.getConnection()" it checks for "mgr.shutdownInProgress()" 
> and returns ConnectException to the caller 
> "GMSMembershipManager.directChannelSend()"
> In client/server scenario, if the client is performing cache operation, the 
> cache operation may succeed in server that is getting down and failure to 
> deliver the message to other peers/servers. The client will see the operation 
> getting successfully completed.  
> The above scenario could result in client missing an event and resulting in 
> data mismatch between client and server.
> Consider scenario:
> -- 2 servers, server1 and server2 in distributed system.
> -- client1 has subscribed for events, with its primary queue on server1 and 
> secondary queue on server2.
> -- client2 does an put with Key1 on server1. 
> -- the server1 sends the event to client1 (client1 now has key1) and while 
> sending the message to server 2, it gets disconnected. While shutdown in 
> progress, its unable to deliver the message to server2 and instead of 
> throwing the exception to client, it responds to client with success. The 
> client1 now fails-over to server1;  its out of sync with server1 cache (which 
> doesn't has key1). 
> Solution: By throwing the disconnect exception back to the caller. This 
> enables the caller (client) to retry the operation on the live server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to