[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-16807: --- Fix Version/s: (was: 1.4.0) > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0, 1.3.0, 1.2.4, 0.98.24, 1.1.8 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, > HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-16807: Fix Version/s: (was: 1.3.1) 1.3.0 > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, > HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-16807: -- Fix Version/s: 1.1.8 1.2.5 > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, > HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-16807: - Attachment: HBASE-16807-branch-1.2.patch HBASE-16807-branch-1.1.patch > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, > HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-16807: -- Fix Version/s: 0.98.24 1.3.1 1.4.0 > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-16807: - Attachment: HBASE-16807-branch-1.3.patch HBASE-16807-branch-1.patch HBASE-16807-0.98.patch > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0 > > Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, > HBASE-16807-branch-1.patch, HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-16807: -- Release Note: (was: push to master. Thanks all the guys!) > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0 > > Attachments: HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-16807: -- Resolution: Fixed Release Note: push to master. Thanks all the guys! Status: Resolved (was: Patch Available) > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0 > > Attachments: HBASE-16807.patch > > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker will always return old active HM detail to Region server > on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
[ https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-16807: - Component/s: regionserver > RegionServer will fail to report new active Hmaster until > HMaster/RegionServer failover > --- > > Key: HBASE-16807 > URL: https://issues.apache.org/jira/browse/HBASE-16807 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > > It's little weird, but it happened in the product environment that few > RegionServer missed master znode create notification on master failover. In > that case ZooKeeperNodeTracker will not refresh the cached data and > MasterAddressTracker > will always return old active HM detail to Region server on ServiceException. > Though We create region server stub on failure but without refreshing the > MasterAddressTracker data. > In HRegionServer.createRegionServerStatusStub() > {code} > boolean refresh = false; // for the first time, use cached data > RegionServerStatusService.BlockingInterface intf = null; > boolean interrupted = false; > try { > while (keepLooping()) { > sn = this.masterAddressTracker.getMasterAddress(refresh); > if (sn == null) { > if (!keepLooping()) { > // give up with no connection. > LOG.debug("No master found and cluster is stopped; bailing out"); > return null; > } > if (System.currentTimeMillis() > (previousLogTime + 1000)) { > LOG.debug("No master found; retry"); > previousLogTime = System.currentTimeMillis(); > } > refresh = true; // let's try pull it from ZK directly > if (sleep(200)) { > interrupted = true; > } > continue; > } > {code} > Here we refresh node only when 'sn' is NULL otherwise it will use same cached > data. > So in above case RegionServer will never report active HMaster successfully > until HMaster failover or RegionServer restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)