Re: Review Request 59925: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain

2017-06-08 Thread Galen O'Sullivan


> On June 8, 2017, 11:45 p.m., Galen O'Sullivan wrote:
> > This makes sense to me: we remove the locators if we can't connect to them.
> > 
> > I wonder what happens if the two locators can't talk to each other (at 
> > first, anyways) but can talk to the rest of the cluster. I imagine this is 
> > handled by our view management and as long as the cluster is otherwise 
> > healthy, it will be fine.
> > 
> > As an aside, I'm curious about weight and failure -- do we expire servers 
> > from the weighting for split-brain detection after a while?

I seem to have missed the button: SHIP IT! Looks good!


- Galen


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59925/#review177422
---


On June 8, 2017, 6:36 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59925/
> ---
> 
> (Updated June 8, 2017, 6:36 p.m.)
> 
> 
> Review request for geode, Alexander Murmann, Galen O'Sullivan, Hitesh 
> Khamesra, Udo Kohlmeyer, and Brian Rowe.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> When restarting from a locatorView.dat file we should ignore any locator 
> entries in the view.  Recovery tries to get this state from other locators 
> before resorting to using the persisted view so there we know all of the 
> locator entries in the view are invalid.  This allows the locators to quickly 
> move into the concurrent-startup algorithm and find each other.
> 
> I removed the Flaky categorization of the test that I modified to reproduce 
> the problem.  A subclass's use of the test was reported as a Flaky failure 
> but I found that the ticket was closed.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
>  e3635f2d93aae212cbff2f2058b6dc728a04776a 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 
> 8ff9b67e13dd50499d861ff62ddae3fb8668dd28 
>   
> geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java
>  9d49d30abfb8acccd8a5547ba0ee3c7bcf9e7970 
> 
> 
> Diff: https://reviews.apache.org/r/59925/diff/1/
> 
> 
> Testing
> ---
> 
> The problem was easily reproduced using LocatorDUnitTest.testStartTwoLocators 
> by repeating the cycling of the locators.  It failed every time I ran it.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Re: Review Request 59925: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain

2017-06-08 Thread Udo Kohlmeyer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59925/#review177419
---


Ship it!




Ship It!

- Udo Kohlmeyer


On June 8, 2017, 6:36 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59925/
> ---
> 
> (Updated June 8, 2017, 6:36 p.m.)
> 
> 
> Review request for geode, Alexander Murmann, Galen O'Sullivan, Hitesh 
> Khamesra, Udo Kohlmeyer, and Brian Rowe.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> When restarting from a locatorView.dat file we should ignore any locator 
> entries in the view.  Recovery tries to get this state from other locators 
> before resorting to using the persisted view so there we know all of the 
> locator entries in the view are invalid.  This allows the locators to quickly 
> move into the concurrent-startup algorithm and find each other.
> 
> I removed the Flaky categorization of the test that I modified to reproduce 
> the problem.  A subclass's use of the test was reported as a Flaky failure 
> but I found that the ticket was closed.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
>  e3635f2d93aae212cbff2f2058b6dc728a04776a 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 
> 8ff9b67e13dd50499d861ff62ddae3fb8668dd28 
>   
> geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java
>  9d49d30abfb8acccd8a5547ba0ee3c7bcf9e7970 
> 
> 
> Diff: https://reviews.apache.org/r/59925/diff/1/
> 
> 
> Testing
> ---
> 
> The problem was easily reproduced using LocatorDUnitTest.testStartTwoLocators 
> by repeating the cycling of the locators.  It failed every time I ran it.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>



Re: Review Request 59925: GEODE-3052 Restarting 2 locators within 1s of each other causes potential locator split brain

2017-06-08 Thread Brian Rowe

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59925/#review177371
---


Ship it!




Ship It!

- Brian Rowe


On June 8, 2017, 6:36 p.m., Bruce Schuchardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59925/
> ---
> 
> (Updated June 8, 2017, 6:36 p.m.)
> 
> 
> Review request for geode, Alexander Murmann, Galen O'Sullivan, Hitesh 
> Khamesra, Udo Kohlmeyer, and Brian Rowe.
> 
> 
> Repository: geode
> 
> 
> Description
> ---
> 
> When restarting from a locatorView.dat file we should ignore any locator 
> entries in the view.  Recovery tries to get this state from other locators 
> before resorting to using the persisted view so there we know all of the 
> locator entries in the view are invalid.  This allows the locators to quickly 
> move into the concurrent-startup algorithm and find each other.
> 
> I removed the Flaky categorization of the test that I modified to reproduce 
> the problem.  A subclass's use of the test was reported as a Flaky failure 
> but I found that the ticket was closed.
> 
> 
> Diffs
> -
> 
>   
> geode-core/src/main/java/org/apache/geode/distributed/internal/membership/gms/locator/GMSLocator.java
>  e3635f2d93aae212cbff2f2058b6dc728a04776a 
>   geode-core/src/test/java/org/apache/geode/distributed/LocatorDUnitTest.java 
> 8ff9b67e13dd50499d861ff62ddae3fb8668dd28 
>   
> geode-core/src/test/java/org/apache/geode/distributed/LocatorUDPSecurityDUnitTest.java
>  9d49d30abfb8acccd8a5547ba0ee3c7bcf9e7970 
> 
> 
> Diff: https://reviews.apache.org/r/59925/diff/1/
> 
> 
> Testing
> ---
> 
> The problem was easily reproduced using LocatorDUnitTest.testStartTwoLocators 
> by repeating the cycling of the locators.  It failed every time I ran it.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>