[geode] branch develop updated: GEODE-8385: hang recovering from disk with cyclic dependencies (#5403)

2020-07-29 Thread bschuchardt
This is an automated email from the ASF dual-hosted git repository.

bschuchardt pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/geode.git


The following commit(s) were added to refs/heads/develop by this push:
 new 08316aa  GEODE-8385: hang recovering from disk with cyclic 
dependencies (#5403)
08316aa is described below

commit 08316aa05198704d96aefc5497e483052c27a378
Author: Bruce Schuchardt 
AuthorDate: Wed Jul 29 07:58:39 2020 -0700

GEODE-8385: hang recovering from disk with cyclic dependencies (#5403)

* GEODE-8385: hang recovering from disk with cyclic dependencies

This restores the point at which we notify membership listeners of
departures.  We used to do this (in 1.12 and earlier) when a ShutdownMessage
is received instead of waiting for a new membership view announcing the 
departure.
Membership views can take some time to form and install, which can cause
persistent (disk store) views to be updated later than they used to be.

In the case of this ticket the disk store of one member was being
closed while another was shutting down.  The member closing its disk
store did not see the view announcing that shutdown until most of its
disk store regions had closed their persistence advisors.  This left the
disk store thinking that the other member was still up at the time it
was closed.
---
 .../ClusterDistributionManagerDUnitTest.java   | 28 
 .../internal/ClusterDistributionManager.java   | 76 ++
 .../distributed/internal/StartupOperation.java |  1 -
 .../internal/membership/gms/GMSMemberData.java |  4 +-
 .../internal/membership/gms/GMSMembership.java |  6 +-
 5 files changed, 69 insertions(+), 46 deletions(-)

diff --git 
a/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
 
b/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
index 9d3bbd9..e28f15ab 100644
--- 
a/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
+++ 
b/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
@@ -211,6 +211,34 @@ public class ClusterDistributionManagerDUnitTest extends 
CacheTestCase {
 .until(() -> !membershipManager.isSurpriseMember(member));
   }
 
+  @Test
+  public void shutdownMessageCausesListenerInvocation() {
+final AtomicBoolean listenerInvoked = new AtomicBoolean();
+vm1.invoke("join the cluster", () -> getSystem().getDistributedMember()); 
// lead member
+system = getSystem(); // non-lead member
+// this membership listener will be invoked when the shutdown message is 
received
+system.getDistributionManager().addMembershipListener(new 
MembershipListener() {
+  @Override
+  public void memberDeparted(DistributionManager distributionManager,
+  InternalDistributedMember id, boolean crashed) {
+assertThat(crashed).isFalse();
+listenerInvoked.set(Boolean.TRUE);
+  }
+});
+final InternalDistributedMember memberID = system.getDistributedMember();
+locatorvm.invoke("send a shutdown message", () -> {
+  final DistributionManager distributionManager =
+  ((InternalDistributedSystem) 
Locator.getLocator().getDistributedSystem())
+  .getDistributionManager();
+  final ShutdownMessage shutdownMessage = new ShutdownMessage();
+  shutdownMessage.setRecipient(memberID);
+  
shutdownMessage.setDistributionManagerId(distributionManager.getDistributionManagerId());
+  distributionManager.putOutgoing(shutdownMessage);
+});
+await().until(() -> listenerInvoked.get());
+  }
+
+
   /**
* Tests that a severe-level alert is generated if a member does not respond 
with an ack quickly
* enough. vm0 and vm1 create a region and set ack-severe-alert-threshold. 
vm1 has a cache
diff --git 
a/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
 
b/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
index 9a52f60..1f58344 100644
--- 
a/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
+++ 
b/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
@@ -1782,24 +1782,6 @@ public class ClusterDistributionManager implements 
DistributionManager {
   }
 
   /**
-   * Returns true if id was removed. Returns false if it was not in the list 
of managers.
-   */
-  private boolean removeManager(InternalDistributedMember theId, boolean 
crashed, String p_reason) {
-String reason = p_reason;
-
-reason = prettifyReason(reason);
-if (logger.isDebugEnabled()) {
-  logger.debug("DistributionManager: removing member <{}>; crashed {}; 

[geode] branch develop updated: GEODE-8385: hang recovering from disk with cyclic dependencies (#5403)

2020-07-29 Thread bschuchardt
This is an automated email from the ASF dual-hosted git repository.

bschuchardt pushed a commit to branch develop
in repository https://gitbox.apache.org/repos/asf/geode.git


The following commit(s) were added to refs/heads/develop by this push:
 new 08316aa  GEODE-8385: hang recovering from disk with cyclic 
dependencies (#5403)
08316aa is described below

commit 08316aa05198704d96aefc5497e483052c27a378
Author: Bruce Schuchardt 
AuthorDate: Wed Jul 29 07:58:39 2020 -0700

GEODE-8385: hang recovering from disk with cyclic dependencies (#5403)

* GEODE-8385: hang recovering from disk with cyclic dependencies

This restores the point at which we notify membership listeners of
departures.  We used to do this (in 1.12 and earlier) when a ShutdownMessage
is received instead of waiting for a new membership view announcing the 
departure.
Membership views can take some time to form and install, which can cause
persistent (disk store) views to be updated later than they used to be.

In the case of this ticket the disk store of one member was being
closed while another was shutting down.  The member closing its disk
store did not see the view announcing that shutdown until most of its
disk store regions had closed their persistence advisors.  This left the
disk store thinking that the other member was still up at the time it
was closed.
---
 .../ClusterDistributionManagerDUnitTest.java   | 28 
 .../internal/ClusterDistributionManager.java   | 76 ++
 .../distributed/internal/StartupOperation.java |  1 -
 .../internal/membership/gms/GMSMemberData.java |  4 +-
 .../internal/membership/gms/GMSMembership.java |  6 +-
 5 files changed, 69 insertions(+), 46 deletions(-)

diff --git 
a/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
 
b/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
index 9d3bbd9..e28f15ab 100644
--- 
a/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
+++ 
b/geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java
@@ -211,6 +211,34 @@ public class ClusterDistributionManagerDUnitTest extends 
CacheTestCase {
 .until(() -> !membershipManager.isSurpriseMember(member));
   }
 
+  @Test
+  public void shutdownMessageCausesListenerInvocation() {
+final AtomicBoolean listenerInvoked = new AtomicBoolean();
+vm1.invoke("join the cluster", () -> getSystem().getDistributedMember()); 
// lead member
+system = getSystem(); // non-lead member
+// this membership listener will be invoked when the shutdown message is 
received
+system.getDistributionManager().addMembershipListener(new 
MembershipListener() {
+  @Override
+  public void memberDeparted(DistributionManager distributionManager,
+  InternalDistributedMember id, boolean crashed) {
+assertThat(crashed).isFalse();
+listenerInvoked.set(Boolean.TRUE);
+  }
+});
+final InternalDistributedMember memberID = system.getDistributedMember();
+locatorvm.invoke("send a shutdown message", () -> {
+  final DistributionManager distributionManager =
+  ((InternalDistributedSystem) 
Locator.getLocator().getDistributedSystem())
+  .getDistributionManager();
+  final ShutdownMessage shutdownMessage = new ShutdownMessage();
+  shutdownMessage.setRecipient(memberID);
+  
shutdownMessage.setDistributionManagerId(distributionManager.getDistributionManagerId());
+  distributionManager.putOutgoing(shutdownMessage);
+});
+await().until(() -> listenerInvoked.get());
+  }
+
+
   /**
* Tests that a severe-level alert is generated if a member does not respond 
with an ack quickly
* enough. vm0 and vm1 create a region and set ack-severe-alert-threshold. 
vm1 has a cache
diff --git 
a/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
 
b/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
index 9a52f60..1f58344 100644
--- 
a/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
+++ 
b/geode-core/src/main/java/org/apache/geode/distributed/internal/ClusterDistributionManager.java
@@ -1782,24 +1782,6 @@ public class ClusterDistributionManager implements 
DistributionManager {
   }
 
   /**
-   * Returns true if id was removed. Returns false if it was not in the list 
of managers.
-   */
-  private boolean removeManager(InternalDistributedMember theId, boolean 
crashed, String p_reason) {
-String reason = p_reason;
-
-reason = prettifyReason(reason);
-if (logger.isDebugEnabled()) {
-  logger.debug("DistributionManager: removing member <{}>; crashed {};