Rushabh Shah created HBASE-26468:
------------------------------------

             Summary: Region Server doesn't exit cleanly incase it crashes.
                 Key: HBASE-26468
                 URL: https://issues.apache.org/jira/browse/HBASE-26468
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 1.6.0
            Reporter: Rushabh Shah


Observed this in our production cluster running 1.6 version.
RS crashed due to some reason but the process was still running. On debugging 
more, found out there was 1 non-daemon thread running and that was not allowing 
RS to stop cleanly. Our clusters are managed by Ambari and have auto restart 
capability within them. But since the process was running and pid file was 
present, Ambari also couldn't do much. There will be some bug where we will 
miss to stop some daemon thread but there should be some maximum amount of time 
we should wait before exiting the thread.

Relevant code: 
[HRegionServerCommandLine.java|https://github.com/apache/hbase/blob/branch-2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServerCommandLine.java]


{code:java}
        logProcessInfo(getConf());
        HRegionServer hrs = 
HRegionServer.constructRegionServer(regionServerClass, conf);
        hrs.start();
        hrs.join();  -----> This should be a timed join.
        if (hrs.isAborted()) {
          throw new RuntimeException("HRegionServer Aborted");
        }
      }
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to