[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-09 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-branch-1.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, HBASE-13845-branch-1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-09 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-09 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-master-test-case-only.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.1.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch, 
 HBASE-13845-branch-1.patch, HBASE-13845-master-test-case-only.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-08 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Fix Version/s: 1.1.1
   1.2.0
   2.0.0
 Assignee: Jerry He
Affects Version/s: 1.2.0
   2.0.0
   Status: Patch Available  (was: Open)

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 1.1.0, 2.0.0, 1.2.0
Reporter: Jerry He
Assignee: Jerry He
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: HBASE-13845-branch-1.1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-08 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-branch-1.1.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 1.1.0
Reporter: Jerry He
 Attachments: HBASE-13845-branch-1.1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-08 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: HBASE-13845-branch-1.1.patch

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 1.1.0
Reporter: Jerry He
 Attachments: HBASE-13845-branch-1.1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13845) Expire of one region server carrying meta can bring down the master

2015-06-08 Thread Jerry He (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry He updated HBASE-13845:
-
Attachment: (was: HBASE-13845-branch-1.1.patch)

 Expire of one region server carrying meta can bring down the master
 ---

 Key: HBASE-13845
 URL: https://issues.apache.org/jira/browse/HBASE-13845
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 1.1.0
Reporter: Jerry He
 Attachments: HBASE-13845-branch-1.1.patch


 There seems to be a code bug that can cause expiration of one region server 
 carrying meta to bring down the master under certain case.
 Here is the sequence of event.
 a) The master detects the expiration of a region server on ZK, and starts to 
 expire the region server.
 b) Since the failed region server carries meta, the shutdown handler will 
 call verifyAndAssignMetaWithRetries() during processing the expired rs.
 c)  In verifyAndAssignMeta(), there is a logic to verifyMetaRegionLocation
 {code}
 (!server.getMetaTableLocator().verifyMetaRegionLocation(server.getConnection(),
   this.server.getZooKeeper(), timeout)) {
   this.services.getAssignmentManager().assignMeta
   (HRegionInfo.FIRST_META_REGIONINFO);
 } else if 
 (serverName.equals(server.getMetaTableLocator().getMetaRegionLocation(
   this.server.getZooKeeper( {
   throw new IOException(hbase:meta is onlined on the dead server 
   + serverName);
 {code}
 If we see the meta region is still alive on the expired rs, we throw an 
 exception.
 We do some retries (default 10x1000ms) for verifyAndAssignMeta.
 If we still get the exception after retries, we abort the master.
 {code}
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: Master 
 server abort: loaded coprocessors are: []
 2015-05-27 06:58:30,156 FATAL 
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] master.HMaster: 
 verifyAndAssignMeta failed after10 times retries, aborting
 java.io.IOException: hbase:meta is onlined on the dead server 
 bdvs1164.svl.ibm.com,16020,1432681743203
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMeta(MetaServerShutdownHandler.java:162)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.verifyAndAssignMetaWithRetries(MetaServerShutdownHandler.java:184)
 at 
 org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:93)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 2015-05-27 06:58:30,156 INFO  
 [MASTER_META_SERVER_OPERATIONS-bdvs1163:6-0] regionserver.HRegionServer: 
 STOPPED: verifyAndAssignMeta failed after10 times retries, aborting
 {code}
 The problem happens when the expired is slow processing its own expiration or 
 has a slow death, and is still able to respond to master's meta verification 
 in the meantime



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)