Re: Review Request 59057: GEODE-2193 a member is kicked out immediately after joining

2017-05-08 Thread Bruce Schuchardt


> On May 8, 2017, 5:44 p.m., Hitesh Khamesra wrote:
> > geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
> > Line 830 (original)
> > 
> >
> > I think problem here is, we send shutdown message using Tcp layer. In 
> > that case, "receiver1" gets that shutdown message and pass that info to 
> > membership layer. Then "receiver1" becomes coordinator(legal coordinator) 
> > by removing current coordinator. Now if current coordinator sends new view 
> > then cluster just ignores that view, as cluster has new-view by "receiver1".

Thanks Hitesh.  I agree - I had removed the random number addition to the view 
number in becomeCoordinator last week and couldn't remember why I'd done that 
this morning so I reverted the change.  I'm going to put that back in because 
it makes it so that the prepared view isn't ignored.


- Bruce


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59057/#review174198
---


On May 8, 2017, 5:23 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59057/
> ---
> 
> (Updated May 8, 2017, 5:23 p.m.)
> 
> 
> Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo 
> Kohlmeyer.
> 
> 
> Bugs: GEODE-2193
> https://issues.apache.org/jira/browse/GEODE-2193
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The previous fix for this ticket introduced a shutdown problem that caused 
> servers to pause waiting for ShutdownMessage to be sent to another server 
> that had already exited.  We reduced the pause time but this change set fixes 
> the problem by transmitting the message over UDP instead of TCP/IP stream 
> sockets.
> 
> Another change in GMSJoinLeave prepareView/sendView allows a membership 
> coordinator that is shutting down to complete the sending out of a new view 
> if it has already prepared the view when shutdown begins.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
>  e0c0ba29a5c74614d2430fb78d972e306a355845 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
>  8ae66d0b6839cfbd46b479d896104f54fd11a68d 
>   geode-core/src/main/java/org/apache/geode/internal/util/PluckStacks.java 
> 357812a6ec0cb09a88fa727a4bf828f18794264d 
> 
> 
> Diff: https://reviews.apache.org/r/59057/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin plus 1000 runs of the test that was hitting this issue at least 4% 
> of the time
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Re: Review Request 59057: GEODE-2193 a member is kicked out immediately after joining

2017-05-08 Thread Hitesh Khamesra


> On May 8, 2017, 5:44 p.m., Hitesh Khamesra wrote:
> >

How about sending pending joinRequest(new member) with shutdown message. And 
let new coordinator take care of it.


- Hitesh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59057/#review174198
---


On May 8, 2017, 5:23 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59057/
> ---
> 
> (Updated May 8, 2017, 5:23 p.m.)
> 
> 
> Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo 
> Kohlmeyer.
> 
> 
> Bugs: GEODE-2193
> https://issues.apache.org/jira/browse/GEODE-2193
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The previous fix for this ticket introduced a shutdown problem that caused 
> servers to pause waiting for ShutdownMessage to be sent to another server 
> that had already exited.  We reduced the pause time but this change set fixes 
> the problem by transmitting the message over UDP instead of TCP/IP stream 
> sockets.
> 
> Another change in GMSJoinLeave prepareView/sendView allows a membership 
> coordinator that is shutting down to complete the sending out of a new view 
> if it has already prepared the view when shutdown begins.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
>  e0c0ba29a5c74614d2430fb78d972e306a355845 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
>  8ae66d0b6839cfbd46b479d896104f54fd11a68d 
>   geode-core/src/main/java/org/apache/geode/internal/util/PluckStacks.java 
> 357812a6ec0cb09a88fa727a4bf828f18794264d 
> 
> 
> Diff: https://reviews.apache.org/r/59057/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin plus 1000 runs of the test that was hitting this issue at least 4% 
> of the time
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Re: Review Request 59057: GEODE-2193 a member is kicked out immediately after joining

2017-05-08 Thread Hitesh Khamesra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59057/#review174199
---




geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
Line 1865 (original), 1865 (patched)


I am not sure this will also helps as it is similar to real 
proplem(describe earlier), where receiver will become new coordinator. And that 
will create new view by removing current coordinator.


- Hitesh Khamesra


On May 8, 2017, 5:23 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59057/
> ---
> 
> (Updated May 8, 2017, 5:23 p.m.)
> 
> 
> Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo 
> Kohlmeyer.
> 
> 
> Bugs: GEODE-2193
> https://issues.apache.org/jira/browse/GEODE-2193
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The previous fix for this ticket introduced a shutdown problem that caused 
> servers to pause waiting for ShutdownMessage to be sent to another server 
> that had already exited.  We reduced the pause time but this change set fixes 
> the problem by transmitting the message over UDP instead of TCP/IP stream 
> sockets.
> 
> Another change in GMSJoinLeave prepareView/sendView allows a membership 
> coordinator that is shutting down to complete the sending out of a new view 
> if it has already prepared the view when shutdown begins.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
>  e0c0ba29a5c74614d2430fb78d972e306a355845 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
>  8ae66d0b6839cfbd46b479d896104f54fd11a68d 
>   geode-core/src/main/java/org/apache/geode/internal/util/PluckStacks.java 
> 357812a6ec0cb09a88fa727a4bf828f18794264d 
> 
> 
> Diff: https://reviews.apache.org/r/59057/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin plus 1000 runs of the test that was hitting this issue at least 4% 
> of the time
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Re: Review Request 59057: GEODE-2193 a member is kicked out immediately after joining

2017-05-08 Thread Hitesh Khamesra

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59057/#review174198
---




geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
Line 830 (original)


I think problem here is, we send shutdown message using Tcp layer. In that 
case, "receiver1" gets that shutdown message and pass that info to membership 
layer. Then "receiver1" becomes coordinator(legal coordinator) by removing 
current coordinator. Now if current coordinator sends new view then cluster 
just ignores that view, as cluster has new-view by "receiver1".


- Hitesh Khamesra


On May 8, 2017, 5:23 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59057/
> ---
> 
> (Updated May 8, 2017, 5:23 p.m.)
> 
> 
> Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo 
> Kohlmeyer.
> 
> 
> Bugs: GEODE-2193
> https://issues.apache.org/jira/browse/GEODE-2193
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> The previous fix for this ticket introduced a shutdown problem that caused 
> servers to pause waiting for ShutdownMessage to be sent to another server 
> that had already exited.  We reduced the pause time but this change set fixes 
> the problem by transmitting the message over UDP instead of TCP/IP stream 
> sockets.
> 
> Another change in GMSJoinLeave prepareView/sendView allows a membership 
> coordinator that is shutting down to complete the sending out of a new view 
> if it has already prepared the view when shutdown begins.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
>  e0c0ba29a5c74614d2430fb78d972e306a355845 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
>  8ae66d0b6839cfbd46b479d896104f54fd11a68d 
>   geode-core/src/main/java/org/apache/geode/internal/util/PluckStacks.java 
> 357812a6ec0cb09a88fa727a4bf828f18794264d 
> 
> 
> Diff: https://reviews.apache.org/r/59057/diff/2/
> 
> 
> Testing
> ---
> 
> precheckin plus 1000 runs of the test that was hitting this issue at least 4% 
> of the time
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Review Request 59057: GEODE-2193 a member is kicked out immediately after joining

2017-05-08 Thread Bruce Schuchardt

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59057/
---

Review request for geode, Galen O'Sullivan, Hitesh Khamesra, and Udo Kohlmeyer.


Bugs: GEODE-2193
https://issues.apache.org/jira/browse/GEODE-2193


Repository: geode


Description
---

The previous fix for this ticket introduced a shutdown problem that caused 
servers to pause waiting for ShutdownMessage to be sent to another server that 
had already exited.  We reduced the pause time but this change set fixes the 
problem by transmitting the message over UDP instead of TCP/IP stream sockets.

Another change in GMSJoinLeave prepareView/sendView allows a membership 
coordinator that is shutting down to complete the sending out of a new view if 
it has already prepared the view when shutdown begins.


Diffs
-

  
geode-core/src/main/java/org/apache/geode/distributed/internal/DistributionManager.java
 df880a076739509fe48394dd224ae2ea33c60dd5 
  
geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
 e0c0ba29a5c74614d2430fb78d972e306a355845 
  
geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/mgr/GMSMembershipManager.java
 8ae66d0b6839cfbd46b479d896104f54fd11a68d 
  geode-core/src/main/java/org/apache/geode/internal/util/PluckStacks.java 
357812a6ec0cb09a88fa727a4bf828f18794264d 


Diff: https://reviews.apache.org/r/59057/diff/1/


Testing
---

precheckin plus 1000 runs of the test that was hitting this issue at least 4% 
of the time


Thanks,

Bruce Schuchardt