[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-08-25 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11536:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to all branches.

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6

 Attachments: 10.237.12.13.log, 10.237.12.15.log, 11536-trunk.txt, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-08-24 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11536:
--

Attachment: 11536-trunk.txt

Here's what I think this should be in trunk. MetaEditor is gone, I assume 
MetaTableAccessor is the right spot.

I would like this in before 0.94.23.

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6

 Attachments: 10.237.12.13.log, 10.237.12.15.log, 11536-trunk.txt, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-08-24 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11536:
--

Status: Patch Available  (was: Open)

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6

 Attachments: 10.237.12.13.log, 10.237.12.15.log, 11536-trunk.txt, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-08-22 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-11536:
--

Assignee: Liu Shaohui

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6

 Attachments: 10.237.12.13.log, 10.237.12.15.log, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-07-31 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11536:
---

Fix Version/s: 0.98.6
   0.94.23
   2.0.0
   0.99.0

Setting fix versions. Please change if you disagree

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.94.23, 0.98.6

 Attachments: 10.237.12.13.log, 10.237.12.15.log, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-07-25 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated HBASE-11536:


Attachment: HBASE-11536-0.94-v1.diff

A patch for 0.94 using the regionserver timestamp as the version of meta put.

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Priority: Critical
 Attachments: 10.237.12.13.log, 10.237.12.15.log, 
 HBASE-11536-0.94-v1.diff


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location

2014-07-17 Thread Liu Shaohui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Shaohui updated HBASE-11536:


Attachment: 10.237.12.15.log
10.237.12.13.log

HRegionServer log for this issue.

 Puts of region location to Meta may be out of order which causes inconsistent 
 of region location
 

 Key: HBASE-11536
 URL: https://issues.apache.org/jira/browse/HBASE-11536
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Liu Shaohui
Priority: Critical
 Attachments: 10.237.12.13.log, 10.237.12.15.log


 In product hbase cluster, we found inconsistency of region location in the 
 meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in 
 regionserver 10.237.12.13:11600 but the region location in Meta table is 
 10.237.12.15:11600.
 This is because of the out-of-order puts for meta table.
 # HMaster try to assign the region to 10.237.12.15:11600.
 # RegionServer: 10.237.12.15:11600. During the opening the region, the put of 
 region location(10.237.12.15:11600) to meta table is timeout(60s) and the 
 htable retry for second time. (regionserver serving meta has got the request 
 of the put. The timeout is beause  ther is a bad disk in this regionserver 
 and sync of hlog is very slow. 
 )
 During the retry in htable, the OpenRegionHandler is timeout(100s) and the 
 PostOpenDeployTasksThread is interrupted. Through the htable is closed in the 
 MetaEditor finally, the share connection the htable used is not closed and 
 the call of put for meta table is on-flying in the connection. Assumed that 
 this on-flying call of put to meta is  named call A.
 # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the 
 OpenRegionHandler marks the assign state of this region to FAILED_OPEN.
 # HMaster watchs this event of FAILED_OPEN and assigns the region to another 
 regionserver: 10.237.12.13:11600
 # RegionServer: 10.237.12.13:11600. This regionserver opens the region 
 successfully . Assumed that the put of region location(10.237.12.13:11600) to 
 meta table in this regionserver is named B.
 There is no order guarantee for call A and B. If call A is processed after 
 call B in regionserver serving meta region, the region location in meta table 
 will be wrong.
 From the raw scan of meta table we found:
 {code}
 scan '.META.', {RAW = true, LIMIT = 1, VERSIONS = 10, STARTROW = 
 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} 
 {code}
 {quote}
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885460553(= Wed Jul 09 13:57:40 +0800 2014), 
 value=10.237.12.15:11600 -- Retry put from 10.237.12.15
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885456731(= Wed Jul 09 13:57:36 +0800 2014), 
 value=10.237.12.13:11600 -- put from 10.237.12.13
 
 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, 
 timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), 
 value=10.237.12.15:11600  -- First put from 10.237.12.15
 {quote}
 Related hbase log is attached in this issue and disscusions are welcomed.
 For there is no order guarantee for puts from different htables, one solution 
 for this issue is to give an increased id for each assignment of a region and 
 use this id as the timestamp of put of region location to meta table. The 
 region location with large assign id will be got by hbase clients.



--
This message was sent by Atlassian JIRA
(v6.2#6252)