[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2019-01-29 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-21535:
---
   Resolution: Fixed
Fix Version/s: 2.3.0
   2.0.5
   2.1.3
   Status: Resolved  (was: Patch Available)

Pushed to branch-2+. Thanks [~pankaj2461] for contributing.

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21535.branch-2.patch, HBASE-21535.branch-2.patch, 
> HBASE-21535.patch, HBASE-21535.v2.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-17 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Attachment: HBASE-21535.v2.patch

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535.branch-2.patch, HBASE-21535.branch-2.patch, 
> HBASE-21535.patch, HBASE-21535.v2.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21535:
--
Attachment: HBASE-21535.branch-2.patch

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535.branch-2.patch, HBASE-21535.branch-2.patch, 
> HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Attachment: HBASE-21535.branch-2.patch

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535.branch-2.patch, HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Attachment: (was: HBASE-21535-branch-2.patch)

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535.branch-2.patch, HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Attachment: HBASE-21535-branch-2.patch

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535-branch-2.patch, HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Fix Version/s: 2.2.0

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21535-branch-2.patch, HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Fix Version/s: 3.0.0
Affects Version/s: 2.2.0
   3.0.0
   2.1.1
   2.0.3
   Status: Patch Available  (was: Open)

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.3, 2.1.1, 3.0.0, 2.2.0
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-12-03 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
Attachment: HBASE-21535.patch

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Attachments: HBASE-21535.patch
>
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-11-30 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
External issue URL:   (was: 
https://issues.apache.org/jira/browse/HBASE-19694)

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21535) Zombie Master detector is not working

2018-11-30 Thread Pankaj Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-21535:
-
External issue URL: https://issues.apache.org/jira/browse/HBASE-19694

> Zombie Master detector is not working
> -
>
> Key: HBASE-21535
> URL: https://issues.apache.org/jira/browse/HBASE-21535
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
>
> We have InitializationMonitor thread in HMaster which detects Zombie Hmaster 
> based on _hbase.master.initializationmonitor.timeout _and halts if 
> _hbase.master.initializationmonitor.haltontimeout_ set _true_.
> After HBASE-19694, HMaster initialization order was correted. Hmaster is set 
> active after Initializing ZK system trackers as follows,
> {noformat}
>  status.setStatus("Initializing ZK system trackers");
>  initializeZKBasedSystemTrackers();
>  status.setStatus("Loading last flushed sequence id of regions");
>  try {
>  this.serverManager.loadLastFlushedSequenceIds();
>  } catch (IOException e) {
>  LOG.debug("Failed to load last flushed sequence id of regions"
>  + " from file system", e);
>  }
>  // Set ourselves as active Master now our claim has succeeded up in zk.
>  this.activeMaster = true;
> {noformat}
> But Zombie detector thread is started at the begining phase of 
> finishActiveMasterInitialization(),
> {noformat}
>  private void finishActiveMasterInitialization(MonitoredTask status) throws 
> IOException,
>  InterruptedException, KeeperException, ReplicationException {
>  Thread zombieDetector = new Thread(new InitializationMonitor(this),
>  "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
>  zombieDetector.setDaemon(true);
>  zombieDetector.start();
> {noformat}
> During zombieDetector execution "master.isActiveMaster()" will be false, so 
> it won't wait and cant detect zombie master.
> {noformat}
>  @Override
>  public void run() {
>  try {
>  while (!master.isStopped() && master.isActiveMaster()) {
>  Thread.sleep(timeout);
>  if (master.isInitialized()) {
>  LOG.debug("Initialization completed within allotted tolerance. Monitor 
> exiting.");
>  } else {
>  LOG.error("Master failed to complete initialization after " + timeout + "ms. 
> Please"
>  + " consider submitting a bug report including a thread dump of this 
> process.");
>  if (haltOnTimeout) {
>  LOG.error("Zombie Master exiting. Thread dump to stdout");
>  Threads.printThreadInfo(System.out, "Zombie HMaster");
>  System.exit(-1);
>  }
>  }
>  }
>  } catch (InterruptedException ie) {
>  LOG.trace("InitMonitor thread interrupted. Existing.");
>  }
>  }
>  }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)