[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2017-11-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-16807:
---
Fix Version/s: (was: 1.4.0)

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.3.0, 1.2.4, 0.98.24, 1.1.8
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-21 Thread Mikhail Antonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Antonov updated HBASE-16807:

Fix Version/s: (was: 1.3.1)
   1.3.0

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
Fix Version/s: 1.1.8
   1.2.5

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-16807:
-
Attachment: HBASE-16807-branch-1.2.patch
HBASE-16807-branch-1.1.patch

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
Fix Version/s: 0.98.24
   1.3.1
   1.4.0

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-16807:
-
Attachment: HBASE-16807-branch-1.3.patch
HBASE-16807-branch-1.patch
HBASE-16807-0.98.patch

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Ashish Singhi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Singhi updated HBASE-16807:
--
Release Note:   (was: push to master. Thanks all the guys!)

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
  Resolution: Fixed
Release Note: push to master. Thanks all the guys!
  Status: Resolved  (was: Patch Available)

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-11 Thread Pankaj Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pankaj Kumar updated HBASE-16807:
-
Component/s: regionserver

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker 
> will always return old active HM detail to Region server on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)