[ https://issues.apache.org/jira/browse/ASTERIXDB-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363524#comment-16363524 ]
ASF subversion and git services commented on ASTERIXDB-2284: ------------------------------------------------------------ Commit bf74a319dbdfa3fea3007d3286f14a77fecac178 in asterixdb's branch refs/heads/master from [~mhubail] [ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=bf74a31 ] [ASTERIXDB-2284][CLUS] Ensure Node Failure on Heartbeat Miss - user model changes: no - storage format changes: no - interface changes: no Details: - Request the node which exceeded its heartbeat misses to shutdown to ensure its failures. - Ensure thread safety of lastHeartbeatNanoTime in NodeControllerState. Change-Id: I121f85fd858484377a9d888d18c3069c239f00fc Reviewed-on: https://asterix-gerrit.ics.uci.edu/2390 Sonar-Qube: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Contrib: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Reviewed-by: Michael Blow <mb...@apache.org> > Ensure Node Failure on Heartbeat Misses > --------------------------------------- > > Key: ASTERIXDB-2284 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2284 > Project: Apache AsterixDB > Issue Type: Improvement > Reporter: Murtadha Hubail > Assignee: Murtadha Hubail > Priority: Major > > Currently, there is a possibility that an NC exceeds the allowed period to > send its heartbeat (i.e. due to garbage collection pause), and continue to > stay up which will result in the cluster state being unusable forever. The > proposal is to ensure the failed node has really failed by asking it to > shutdown. By doing this, if the shutdown succeeds, the NC will be restarted > and the cluster state will be active again when the NC joins. -- This message was sent by Atlassian JIRA (v7.6.3#76005)