[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953975#comment-16953975 ] Bruce J Schuchardt commented on GEODE-3780: --- re: "Commit 2104c9bba5cd2b57e41e5c9259d08de31fb8ea3b", that was for ticket GEODE-7311, not GEODE-3780 > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 3h > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953921#comment-16953921 ] ASF subversion and git services commented on GEODE-3780: Commit 2104c9bba5cd2b57e41e5c9259d08de31fb8ea3b in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=2104c9b ] GEODE-3780: disabling test that is failing in CI the test runs fine in intelliJ and in local gradle builds but is failing in CI runs > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 3h > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949831#comment-16949831 ] ASF subversion and git services commented on GEODE-3780: Commit b70dcebef7ab4703058546b67dc3c395fc5d3b39 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=b70dceb ] GEODE-3780: Fixing monitoring of suspected member after passed final check - Suspected member is never watched again after passing final check - This restores our original behavior (pre-1.0) behavior of performing a final check on a member if UDP communications with that member fail. - We now also send exonoration messages to all other members if a suspect is cleared. We need to do that because another member may have sent a Suspect message that was ignored because the suspect was already undergoing a final check. - I also noticed that our tcp/ip final check loop was performing more than one check in many cases because the nanosecond clock has a coarse granularity. A socket so-timeout based on the millisecond clock was timing out but the nanosecond clock didn't line up with that timeout and caused the "for" loop to make another attempt. I changed that loop to convert the nanosecond clock value to milliseconds. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 3h > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949618#comment-16949618 ] ASF subversion and git services commented on GEODE-3780: Commit e527d21d91b6f6b5c1b79a418256fdb9b48a45de in geode's branch refs/heads/feature/GEODE-3780 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=e527d21 ] GEODE-3780 suspected member is never watched again after passing final check This restores our original behavior (pre-1.0) behavior of performing a final check on a member if UDP communications with that member fail. We now also send exonoration messages to all other members if a suspect is cleared. We need to do that because another member may have sent a Suspect message that was ignored because the suspect was already undergoing a final check. I also noticed that our tcp/ip final check loop was performing more than one check in many cases because the nanosecond clock has a coarse granularity. A socket so-timeout based on the millisecond clock was timing out but the nanosecond clock didn't line up with that timeout and caused the "for" loop to make another attempt. I changed that loop to convert the nanosecond clock value to milliseconds. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944722#comment-16944722 ] ASF subversion and git services commented on GEODE-3780: Commit 977aaf2fcc5c0f74a38742aa51e0feeb5296472d in geode's branch refs/heads/feature/GEODE-3780 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=977aaf2 ] GEODE-3780 suspected member is never watched again after passing final check This restores our original behavior (pre-1.0) behavior of performing a final check on a member if UDP communications with that member fail. We now also send exonoration messages to all other members if a suspect is cleared. We need to do that because another member may have sent a Suspect message that was ignored because the suspect was already undergoing a final check. I also noticed that our tcp/ip final check loop was performing more than one check in many cases because the nanosecond clock has a coarse granularity. A socket so-timeout based on the millisecond clock was timing out but the nanosecond clock didn't line up with that timeout and caused the "for" loop to make another attempt. I changed that loop to convert the nanosecond clock value to milliseconds. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce J Schuchardt >Assignee: Bruce J Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910857#comment-16910857 ] ASF subversion and git services commented on GEODE-3780: Commit cd59ee83c00646da12316ebc7c5805d9ef904036 in geode's branch refs/heads/release/1.10.0 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=cd59ee8 ] GEODE-3780 suspected member is never watched again after passing fina… (#3949) (cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9) > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910742#comment-16910742 ] ASF subversion and git services commented on GEODE-3780: Commit 9975d1e10a905b040edeefa0ecb2210d1a1c1525 in geode's branch refs/heads/feature/merge_geode_3780 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9975d1e ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test (cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9) > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910741#comment-16910741 ] ASF subversion and git services commented on GEODE-3780: Commit 9975d1e10a905b040edeefa0ecb2210d1a1c1525 in geode's branch refs/heads/feature/merge_geode_3780 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9975d1e ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test (cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9) > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909273#comment-16909273 ] ASF subversion and git services commented on GEODE-3780: Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch refs/heads/feature/GEODE-7066 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909274#comment-16909274 ] ASF subversion and git services commented on GEODE-3780: Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch refs/heads/feature/GEODE-7066 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908470#comment-16908470 ] ASF subversion and git services commented on GEODE-3780: Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908471#comment-16908471 ] ASF subversion and git services commented on GEODE-3780: Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch refs/heads/develop from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ] GEODE-3780 suspected member is never watched again after passing final check (#3917) * GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. * empty commit * removing getSuspectMembers - it could kick out a suspect member too easily * removing unused method and commented-out code * revising test > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906397#comment-16906397 ] ASF subversion and git services commented on GEODE-3780: Commit 50bcc90c779c5d74c20b75d84ebc4522dd421caa in geode's branch refs/heads/feature/GEODE-3780 from Bruce Schuchardt [ https://gitbox.apache.org/repos/asf?p=geode.git;h=50bcc90 ] GEODE-3780 suspected member is never watched again after passing final check After passing a "final check" a member will be subject to suspect processing again but we weren't processing the suspect message locally. This caused JoinLeave to never be notified of the suspect so that removal could be initiated. I also noticed that a method in HealthMonitor was misnamed. It claimed to return the set of members that had failed availability checks but instead it was returning a set of members currently under suspicion. I renamed the method for clarity. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597841#comment-16597841 ] ASF subversion and git services commented on GEODE-3780: Commit 9ee8cbfe3000df4db87a7388d0123aa40e42b7ec in geode's branch refs/heads/windows-heavy-lifter from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ee8cbf ] GEODE-3780 member is not considered suspect after failing final check Ensure that a member is in the suspects collection after it fails a final check. This allows processSuspectMessage to know if it should perform the duty of a membership-coordinator and initiate final checks based on Suspect messages. I've also done a little bit of refactoring in processSuspectMessage and have removed commented-out code. This closes #2380 > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597595#comment-16597595 ] ASF subversion and git services commented on GEODE-3780: Commit 9ee8cbfe3000df4db87a7388d0123aa40e42b7ec in geode's branch refs/heads/develop from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ee8cbf ] GEODE-3780 member is not considered suspect after failing final check Ensure that a member is in the suspects collection after it fails a final check. This allows processSuspectMessage to know if it should perform the duty of a membership-coordinator and initiate final checks based on Suspect messages. I've also done a little bit of refactoring in processSuspectMessage and have removed commented-out code. This closes #2380 > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581375#comment-16581375 ] ASF subversion and git services commented on GEODE-3780: Commit cd3f372906501f0f936ba96d6907d972d5c2d478 in geode's branch refs/heads/develop from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=cd3f372 ] GEODE-3780 suspected member is never watched again after passing final check Consolidated "unsuspect" processing into a memberUnsuspected() method. Modified "final check" method to not unsuspect a member that fails the check. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580397#comment-16580397 ] ASF subversion and git services commented on GEODE-3780: Commit 38b75a90b2164c0dfd3deb8ef21b059befc9168b in geode's branch refs/heads/feature/GEODE-3780 from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=38b75a9 ] GEODE-3780 suspected member is never watched again after passing final check Changes to address Darrel's review comments > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt >Assignee: Bruce Schuchardt >Priority: Major > Labels: pull-request-available > Fix For: 1.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209857#comment-16209857 ] ASF subversion and git services commented on GEODE-3780: Commit 2636bd842d4b87992ffda45c5d2683060d20c05f in geode's branch refs/heads/develop from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=2636bd8 ] GEODE-3841 CI Failure : WanCommandListDUnitTest.testListGatewaySenderGatewayReceiver_group GEODE-3780 suspected member is never watched again after passing final check Added FinalCheckPassedMessage to the DSFID registry and added a test to ensure that it's possible to serialize and deserialize one of these objects. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > Fix For: 1.4.0 > > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203682#comment-16203682 ] ASF subversion and git services commented on GEODE-3780: Commit 081bccd8391709381f2b3c4cce3b1cf6df49b1ce in geode's branch refs/heads/develop from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=081bccd ] GEODE-3780 suspected member is never watched again after passing final check A member going through a final health check is now put in the suspected members collection and a new "neighbour" is selected for the background monitor thread, ensuring that it doesn't interfere with the health check. Once the health check is done the member is removed from the suspected members collection and a new "neighbour" is selected, allowing the monitor thread to once again consider the suspected member. A message is also sent to the node that initiated suspicion so that it also will resume watching the formerly suspect member. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202152#comment-16202152 ] ASF subversion and git services commented on GEODE-3780: Commit b448c1174bf3ab77b161761e230741d6b978becc in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=b448c11 ] GEODE-3780 GEODE-3780 suspected member is never watched again after passing final check updated sanctionedDataSerializables.txt for the new message introduced on this branch > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202153#comment-16202153 ] ASF subversion and git services commented on GEODE-3780: Commit b448c1174bf3ab77b161761e230741d6b978becc in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=b448c11 ] GEODE-3780 GEODE-3780 suspected member is never watched again after passing final check updated sanctionedDataSerializables.txt for the new message introduced on this branch > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201167#comment-16201167 ] ASF subversion and git services commented on GEODE-3780: Commit 4d0ef238bceecb995a0e8b1fa9501cc5908c9810 in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=4d0ef23 ] GEODE-3780 suspected member is never watched again after passing final check spotless > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201123#comment-16201123 ] ASF subversion and git services commented on GEODE-3780: Commit ad6eff1a0a6ca9700587d5291e9a3fb2e5cee87a in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=ad6eff1 ] GEODE-3780 suspected member is never watched again after passing final check and git bit me again - this contains the content of the new message and the handling of it. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201113#comment-16201113 ] ASF subversion and git services commented on GEODE-3780: Commit 5214474f63737648922a32e9778a33edb8128752 in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=5214474 ] GEODE-3780 suspected member is never watched again after passing final check removed unnecessary "GMSHealthMonitor.this" > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check
[ https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201114#comment-16201114 ] ASF subversion and git services commented on GEODE-3780: Commit 12416bb99ff7803e6f7d84c6038bc1995be4c1a0 in geode's branch refs/heads/feature/GEODE-3780b from [~bschuchardt] [ https://gitbox.apache.org/repos/asf?p=geode.git;h=12416bb ] GEODE-3780 suspected member is never watched again after passing final check incorporating review feedback from Hitesh. We want members other than the coordinator to also reconsider the suspected member. The Monitor will now invoke setNextNeighbor at the end of its run() method and a final check that passes will result in a message being sent to the initiator so that it can start watching the suspect again. > suspected member is never watched again after passing final check > - > > Key: GEODE-3780 > URL: https://issues.apache.org/jira/browse/GEODE-3780 > Project: Geode > Issue Type: Bug > Components: membership >Reporter: Bruce Schuchardt > > In a network-down test we saw a node on the losing side of the network > partition perform final checks on members on the winning side. One of the > final checks mysteriously succeeded > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check failed but detected recent message > traffic for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > [info 2017/09/17 12:24:45.552 PDT > gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941 Detection thread 4> tid=0x128] Final check passed for suspect member > 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026 > After this the suspected member was never checked again and the losing side > failed to shut down. -- This message was sent by Atlassian JIRA (v6.4.14#64029)