[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

   Resolution: Fixed
Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into 0.96 and trunk. Thanks.

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9480:
-

Priority: Major  (was: Blocker)

After talking w/ [~jxiang], this is not a blocker; state comes about because of 
outside intervention, not by normal operation (Yes, we need to make it so that 
outside intervention cannot put hbase into odd state but also need to allow 
administrators to do fixup -- TODO: Make it so only privileged user can do 
status threatening ops).

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-12 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Status: Open  (was: Patch Available)

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-12 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Attachment: trunk-9480_v2.patch

Attached v2, now the test should not be flaky any more.

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-12 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Status: Patch Available  (was: Open)

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Attachment: trunk-9480.patch

Attached a patch to make sure the znode is not deleted in this case.  RS throws 
a region already in transition exception. Master moves the region to 
failed_to_close state once tries run out.  The region can still be closed and 
reassigned if it is eventually closed.

Trying to add a test case now.

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9480:
-

Priority: Blocker  (was: Critical)

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Attachment: trunk-9480_v1.1.patch

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

Attachment: trunk-9480_v1.2.patch

[~jeffreyz], how about version 1.2? If you still see some problem, can you show 
me how to reproduce it so that I can add a testcase to cover it?

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-10 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-9480:
---

Attachment: 9480-1.txt

This is the patch that I have been working with.

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Critical
 Attachments: 9480-1.txt


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-10 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-9480:
---

Fix Version/s: 0.96.0

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Critical
 Fix For: 0.96.0

 Attachments: 9480-1.txt


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-10 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9480:
--

Status: Patch Available  (was: Open)

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Priority: Critical
 Fix For: 0.96.0

 Attachments: 9480-1.txt


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira