[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-12-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452182#comment-17452182
 ] 

Hudson commented on HBASE-26468:


Results for branch master
[build #457 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/General_20Nightly_20Build_20Report/]






(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/457/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-12-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452163#comment-17452163
 ] 

Hudson commented on HBASE-26468:


Results for branch branch-2.4
[build #248 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.4/248/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-12-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451960#comment-17451960
 ] 

Hudson commented on HBASE-26468:


Results for branch branch-2
[build #406 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(x) {color:red}-1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2/406/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-12-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451895#comment-17451895
 ] 

Hudson commented on HBASE-26468:


Results for branch branch-1
[build #186 on 
builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186/]:
 (x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//JDK7_Nightly_Build_Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/186//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-12-01 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451737#comment-17451737
 ] 

Rushabh Shah commented on HBASE-26468:
--

Thank you [~vjasani] for the review and the merge ! 
Thank you [~zhangduo] [~gjacoby] for the review and feedback !

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-23 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448017#comment-17448017
 ] 

Rushabh Shah commented on HBASE-26468:
--

> Maybe we could add a delay? For example, if the process does not exit for 30 
> seconds, we call System.exit to force quit, and the return value should be 
> something other than 0 to indicate that this is a force terminate.

Sounds like a good idea. Thank you [~zhangduo] !

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-22 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447467#comment-17447467
 ] 

Duo Zhang commented on HBASE-26468:
---

Maybe we could add a delay? For example, if the process does not exit for 30 
seconds, we call System.exit to force quit, and the return value should be 
something other than 0 to indicate that this is a force terminate.

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-22 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447451#comment-17447451
 ] 

Viraj Jasani commented on HBASE-26468:
--

[~zhangduo] Although HBASE-26480 will have it's own fix, but I think we might 
still want to make this change. Graceful exit of JVM with status code 0 is a 
behaviour change, but I feel it's for the good.

With this behaviour, we might not be able to know of any other non-daemon 
threads not shutting down properly but on the other hand, we will not see 
zombie processes either (disallowing CD/Monitoring systems to automatically 
start services when significant regionservers are affected).

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-22 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447429#comment-17447429
 ] 

Rushabh Shah commented on HBASE-26468:
--

[~zhangduo] Created this jira with more details on which non daemon thread: 
HBASE-26480

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-22 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447292#comment-17447292
 ] 

Duo Zhang commented on HBASE-26468:
---

So which thread does not exit cleanly?

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 1.7.2, 2.3.8, 2.4.9
>
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-22 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447284#comment-17447284
 ] 

Viraj Jasani commented on HBASE-26468:
--

FYI [~zhangduo] 

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26468) Region Server doesn't exit cleanly incase it crashes.

2021-11-20 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446829#comment-17446829
 ] 

Rushabh Shah commented on HBASE-26468:
--

Created PR for master, branch-2 and branch-1. Please review. 

> Region Server doesn't exit cleanly incase it crashes.
> -
>
> Key: HBASE-26468
> URL: https://issues.apache.org/jira/browse/HBASE-26468
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.6.0
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
>
> Observed this in our production cluster running 1.6 version.
> RS crashed due to some reason but the process was still running. On debugging 
> more, found out there was 1 non-daemon thread running and that was not 
> allowing RS to exit cleanly. Our clusters are managed by Ambari and have auto 
> restart capability within them. But since the process was running and pid 
> file was present, Ambari also couldn't do much. There will be some bug where 
> we will miss to stop some non daemon thread. Shutdown hook will not be called 
> unless one of the following 2 conditions are met:
> # The Java virtual machine shuts down in response to two kinds of events:
> The program exits normally, when the last non-daemon thread exits or when the 
> exit (equivalently, System.exit) method is invoked, or
> # The virtual machine is terminated in response to a user interrupt, such as 
> typing ^C, or a system-wide event, such as user logoff or system shutdown.
> Considering the first condition, when the last non-daemon thread exits or 
> when the exit method is invoked.
> Below is the code snippet from 
> [HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java#L51]
> {code:java}
>   private int start() throws Exception {
> try {
>   if (LocalHBaseCluster.isLocal(conf)) {
>  // Ignore this.
>   } else {
> HRegionServer hrs = 
> HRegionServer.constructRegionServer(regionServerClass, conf);
> hrs.start();
> hrs.join();
> if (hrs.isAborted()) {
>   throw new RuntimeException("HRegionServer Aborted");
> }
>   }
> } catch (Throwable t) {
>   LOG.error("Region server exiting", t);
>   return 1;
> }
> return 0;
>   }
> {code}
> Within HRegionServer, there is a subtle difference between when a server is 
> aborted v/s when it is stopped. If it is stopped, then isAborted will return 
> false and it will exit with return code 0.
> Below is the code from 
> [ServerCommandLine.java|https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/util/ServerCommandLine.java#L147]
> {code:java}
>   public void doMain(String args[]) {
> try {
>   int ret = ToolRunner.run(HBaseConfiguration.create(), this, args);
>   if (ret != 0) {
> System.exit(ret);
>   }
> } catch (Exception e) {
>   LOG.error("Failed to run", e);
>   System.exit(-1);
> }
>   }
> {code}
> If return code is 0, then it won't call System.exit. This means JVM will wait 
> to call ShutdownHook until all non daemon threads are stopped which means 
> infinite wait if we don't close all non-daemon threads cleanly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)