[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-10-17 Thread Bruce J Schuchardt (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953975#comment-16953975
 ] 

Bruce J Schuchardt commented on GEODE-3780:
---

re: "Commit 2104c9bba5cd2b57e41e5c9259d08de31fb8ea3b", that was for ticket 
GEODE-7311, not GEODE-3780

> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-10-17 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953921#comment-16953921
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 2104c9bba5cd2b57e41e5c9259d08de31fb8ea3b in geode's branch 
refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=2104c9b ]

GEODE-3780: disabling test that is failing in CI

the test runs fine in intelliJ and in local gradle builds but is
failing in CI runs


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-10-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949831#comment-16949831
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit b70dcebef7ab4703058546b67dc3c395fc5d3b39 in geode's branch 
refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b70dceb ]

GEODE-3780: Fixing monitoring of suspected member after passed final check

- Suspected member is never watched again after passing final check

- This restores our original behavior (pre-1.0) behavior of performing a
final check on a member if UDP communications with that member fail.

- We now also send exonoration messages to all other members if a suspect is
cleared.  We need to do that because another member may have sent a
Suspect message that was ignored because the suspect was already
undergoing a final check.

- I also noticed that our tcp/ip final check loop was performing more than
one check in many cases because the nanosecond clock has a coarse
granularity.  A socket so-timeout based on the millisecond clock was
timing out but the nanosecond clock didn't line up with that timeout and
caused the "for" loop to make another attempt.  I changed that loop to
convert the nanosecond clock value to milliseconds.



> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-10-11 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949618#comment-16949618
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit e527d21d91b6f6b5c1b79a418256fdb9b48a45de in geode's branch 
refs/heads/feature/GEODE-3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=e527d21 ]

GEODE-3780 suspected member is never watched again after passing final check

This restores our original behavior (pre-1.0) behavior of performing a
final check on a member if UDP communications with that member fail.

We now also send exonoration messages to all other members if a suspect is
cleared.  We need to do that because another member may have sent a
Suspect message that was ignored because the suspect was already
undergoing a final check.

I also noticed that our tcp/ip final check loop was performing more than
one check in many cases because the nanosecond clock has a coarse
granularity.  A socket so-timeout based on the millisecond clock was
timing out but the nanosecond clock didn't line up with that timeout and
caused the "for" loop to make another attempt.  I changed that loop to
convert the nanosecond clock value to milliseconds.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-10-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944722#comment-16944722
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 977aaf2fcc5c0f74a38742aa51e0feeb5296472d in geode's branch 
refs/heads/feature/GEODE-3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=977aaf2 ]

GEODE-3780 suspected member is never watched again after passing final check

This restores our original behavior (pre-1.0) behavior of performing a
final check on a member if UDP communications with that member fail.

We now also send exonoration messages to all other members if a suspect is
cleared.  We need to do that because another member may have sent a
Suspect message that was ignored because the suspect was already
undergoing a final check.

I also noticed that our tcp/ip final check loop was performing more than
one check in many cases because the nanosecond clock has a coarse
granularity.  A socket so-timeout based on the millisecond clock was
timing out but the nanosecond clock didn't line up with that timeout and
caused the "for" loop to make another attempt.  I changed that loop to
convert the nanosecond clock value to milliseconds.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce J Schuchardt
>Assignee: Bruce J Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910857#comment-16910857
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit cd59ee83c00646da12316ebc7c5805d9ef904036 in geode's branch 
refs/heads/release/1.10.0 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=cd59ee8 ]

GEODE-3780 suspected member is never watched again after passing fina… (#3949)

(cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9)

> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910742#comment-16910742
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 9975d1e10a905b040edeefa0ecb2210d1a1c1525 in geode's branch 
refs/heads/feature/merge_geode_3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9975d1e ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test

(cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9)


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-19 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910741#comment-16910741
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 9975d1e10a905b040edeefa0ecb2210d1a1c1525 in geode's branch 
refs/heads/feature/merge_geode_3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9975d1e ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test

(cherry picked from commit 8e9b04470264983d0aa1c7900f6e9be2374549d9)


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909273#comment-16909273
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909274#comment-16909274
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/feature/GEODE-7066 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908470#comment-16908470
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908471#comment-16908471
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 8e9b04470264983d0aa1c7900f6e9be2374549d9 in geode's branch 
refs/heads/develop from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=8e9b044 ]

GEODE-3780 suspected member is never watched again after passing final check 
(#3917)

* GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.

* empty commit

* removing getSuspectMembers - it could kick out a suspect member too easily

* removing unused method and commented-out code

* revising test


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2019-08-13 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906397#comment-16906397
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 50bcc90c779c5d74c20b75d84ebc4522dd421caa in geode's branch 
refs/heads/feature/GEODE-3780 from Bruce Schuchardt
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=50bcc90 ]

GEODE-3780 suspected member is never watched again after passing final check

After passing a "final check" a member will be subject to suspect
processing again but we weren't processing the suspect message locally.
This caused JoinLeave to never be notified of the suspect so that
removal could be initiated.

I also noticed that a method in HealthMonitor was misnamed.  It claimed
to return the set of members that had failed availability checks but
instead it was returning a set of members currently under suspicion.  I
renamed the method for clarity.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2018-08-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597841#comment-16597841
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 9ee8cbfe3000df4db87a7388d0123aa40e42b7ec in geode's branch 
refs/heads/windows-heavy-lifter from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ee8cbf ]

GEODE-3780 member is not considered suspect after failing final check

Ensure that a member is in the suspects collection after it fails a
final check.  This allows processSuspectMessage to know if it should
perform the duty of a membership-coordinator and initiate final
checks based on Suspect messages.

I've also done a little bit of refactoring in processSuspectMessage and
have removed commented-out code.

This closes #2380


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2018-08-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597595#comment-16597595
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 9ee8cbfe3000df4db87a7388d0123aa40e42b7ec in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=9ee8cbf ]

GEODE-3780 member is not considered suspect after failing final check

Ensure that a member is in the suspects collection after it fails a
final check.  This allows processSuspectMessage to know if it should
perform the duty of a membership-coordinator and initiate final
checks based on Suspect messages.

I've also done a little bit of refactoring in processSuspectMessage and
have removed commented-out code.

This closes #2380


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2018-08-15 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581375#comment-16581375
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit cd3f372906501f0f936ba96d6907d972d5c2d478 in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=cd3f372 ]

GEODE-3780 suspected member is never watched again after passing final check

Consolidated "unsuspect" processing into a memberUnsuspected() method.
Modified "final check" method to not unsuspect a member that fails the check.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2018-08-14 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580397#comment-16580397
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 38b75a90b2164c0dfd3deb8ef21b059befc9168b in geode's branch 
refs/heads/feature/GEODE-3780 from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=38b75a9 ]

GEODE-3780 suspected member is never watched again after passing final check

Changes to address Darrel's review comments


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>Assignee: Bruce Schuchardt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209857#comment-16209857
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 2636bd842d4b87992ffda45c5d2683060d20c05f in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=2636bd8 ]

GEODE-3841 CI Failure : 
WanCommandListDUnitTest.testListGatewaySenderGatewayReceiver_group
GEODE-3780 suspected member is never watched again after passing final check

Added FinalCheckPassedMessage to the DSFID registry and added a test
to ensure that it's possible to serialize and deserialize one of these
objects.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
> Fix For: 1.4.0
>
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203682#comment-16203682
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 081bccd8391709381f2b3c4cce3b1cf6df49b1ce in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=081bccd ]

GEODE-3780 suspected member is never watched again after passing final check

A member going through a final health check is now put in the
suspected members collection and a new "neighbour" is selected for
the background monitor thread, ensuring that it doesn't interfere
with the health check.  Once the health check is done the member is
removed from the suspected members collection and a new "neighbour"
is selected, allowing the monitor thread to once again consider the
suspected member.

A message is also sent to the node that initiated suspicion so that
it also will resume watching the formerly suspect member.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202152#comment-16202152
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit b448c1174bf3ab77b161761e230741d6b978becc in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b448c11 ]

GEODE-3780 GEODE-3780 suspected member is never watched again after passing 
final check

updated sanctionedDataSerializables.txt for the new message introduced on
this branch


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202153#comment-16202153
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit b448c1174bf3ab77b161761e230741d6b978becc in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=b448c11 ]

GEODE-3780 GEODE-3780 suspected member is never watched again after passing 
final check

updated sanctionedDataSerializables.txt for the new message introduced on
this branch


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201167#comment-16201167
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 4d0ef238bceecb995a0e8b1fa9501cc5908c9810 in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=4d0ef23 ]

GEODE-3780 suspected member is never watched again after passing final check

spotless


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201123#comment-16201123
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit ad6eff1a0a6ca9700587d5291e9a3fb2e5cee87a in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=ad6eff1 ]

GEODE-3780 suspected member is never watched again after passing final check

and git bit me again - this contains the content of the new message and
the handling of it.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201113#comment-16201113
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 5214474f63737648922a32e9778a33edb8128752 in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=5214474 ]

GEODE-3780 suspected member is never watched again after passing final check

removed unnecessary "GMSHealthMonitor.this"


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (GEODE-3780) suspected member is never watched again after passing final check

2017-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201114#comment-16201114
 ] 

ASF subversion and git services commented on GEODE-3780:


Commit 12416bb99ff7803e6f7d84c6038bc1995be4c1a0 in geode's branch 
refs/heads/feature/GEODE-3780b from [~bschuchardt]
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=12416bb ]

GEODE-3780 suspected member is never watched again after passing final check

incorporating review feedback from Hitesh.  We want members other than
the coordinator to also reconsider the suspected member.  The
Monitor will now invoke setNextNeighbor at the end of its run() method
and a final check that passes will result in a message being sent to
the initiator so that it can start watching the suspect again.


> suspected member is never watched again after passing final check
> -
>
> Key: GEODE-3780
> URL: https://issues.apache.org/jira/browse/GEODE-3780
> Project: Geode
>  Issue Type: Bug
>  Components: membership
>Reporter: Bruce Schuchardt
>
> In a network-down test we saw a node on the losing side of the network 
> partition perform final checks on members on the winning side.  One of the 
> final checks mysteriously succeeded
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check failed but detected recent message 
> traffic for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> [info 2017/09/17 12:24:45.552 PDT 
> gemfire1_rs-FullRegression-2017-09-15-21-00-35-client-10_8941  Detection thread 4> tid=0x128] Final check passed for suspect member 
> 10.32.109.252(gemfire3_rs-FullRegression-2017-09-15-21-00-35-client-16_6135:6135):1026
> After this the suspected member was never checked again and the losing side 
> failed to shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)