[jira] [Updated] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

2013-10-03 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9703:
-

Status: Patch Available  (was: Open)

 DistributedHBaseCluster should not throw exceptions, but do a best effort 
 restore
 -

 Key: HBASE-9703
 URL: https://issues.apache.org/jira/browse/HBASE-9703
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1

 Attachments: hbase-9703_v1.patch


 At the end of integration tests, we are calling 
 DistributedCluster.restoreCluster() in case CM has killed nodes so that we 
 can leave the cluster in the same state that we have taken over. 
 However, if CM is not used in a test (for example ITLoadAndVerify), but some 
 regions servers die, or an external daemon kills the servers, we will still  
 try to restore at the end of the test which may or may not succeed (depending 
 on configuration, the region server going being unaccessible, etc. )
 We can do two things, either do a best effort restore cluster which will not 
 fail the test if there are any errors, or we can skip running restore if no 
 disruptive actions have taken place. 
 I am leaning towards the former one, since if an RS goes down with or w/o CM 
 due to bad disk etc., we cannot restore the cluster, but we should not fail 
 the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

2013-10-03 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9703:
-

Attachment: hbase-9703_v3.patch

rebased. 

 DistributedHBaseCluster should not throw exceptions, but do a best effort 
 restore
 -

 Key: HBASE-9703
 URL: https://issues.apache.org/jira/browse/HBASE-9703
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1

 Attachments: hbase-9703_v1.patch, hbase-9703_v3.patch


 At the end of integration tests, we are calling 
 DistributedCluster.restoreCluster() in case CM has killed nodes so that we 
 can leave the cluster in the same state that we have taken over. 
 However, if CM is not used in a test (for example ITLoadAndVerify), but some 
 regions servers die, or an external daemon kills the servers, we will still  
 try to restore at the end of the test which may or may not succeed (depending 
 on configuration, the region server going being unaccessible, etc. )
 We can do two things, either do a best effort restore cluster which will not 
 fail the test if there are any errors, or we can skip running restore if no 
 disruptive actions have taken place. 
 I am leaning towards the former one, since if an RS goes down with or w/o CM 
 due to bad disk etc., we cannot restore the cluster, but we should not fail 
 the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

2013-10-03 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9703:
-

   Resolution: Fixed
Fix Version/s: (was: 0.96.1)
   0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this. Thanks for looking Sergey. 

 DistributedHBaseCluster should not throw exceptions, but do a best effort 
 restore
 -

 Key: HBASE-9703
 URL: https://issues.apache.org/jira/browse/HBASE-9703
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9703_v1.patch, hbase-9703_v3.patch


 At the end of integration tests, we are calling 
 DistributedCluster.restoreCluster() in case CM has killed nodes so that we 
 can leave the cluster in the same state that we have taken over. 
 However, if CM is not used in a test (for example ITLoadAndVerify), but some 
 regions servers die, or an external daemon kills the servers, we will still  
 try to restore at the end of the test which may or may not succeed (depending 
 on configuration, the region server going being unaccessible, etc. )
 We can do two things, either do a best effort restore cluster which will not 
 fail the test if there are any errors, or we can skip running restore if no 
 disruptive actions have taken place. 
 I am leaning towards the former one, since if an RS goes down with or w/o CM 
 due to bad disk etc., we cannot restore the cluster, but we should not fail 
 the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9703) DistributedHBaseCluster should not throw exceptions, but do a best effort restore

2013-10-02 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-9703:
-

Summary: DistributedHBaseCluster should not throw exceptions, but do a best 
effort restore  (was: DistributedHBaseCluster should not restore the cluster if 
CM is not used)

 DistributedHBaseCluster should not throw exceptions, but do a best effort 
 restore
 -

 Key: HBASE-9703
 URL: https://issues.apache.org/jira/browse/HBASE-9703
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.1

 Attachments: hbase-9703_v1.patch


 At the end of integration tests, we are calling 
 DistributedCluster.restoreCluster() in case CM has killed nodes so that we 
 can leave the cluster in the same state that we have taken over. 
 However, if CM is not used in a test (for example ITLoadAndVerify), but some 
 regions servers die, or an external daemon kills the servers, we will still  
 try to restore at the end of the test which may or may not succeed (depending 
 on configuration, the region server going being unaccessible, etc. )
 We can do two things, either do a best effort restore cluster which will not 
 fail the test if there are any errors, or we can skip running restore if no 
 disruptive actions have taken place. 
 I am leaning towards the former one, since if an RS goes down with or w/o CM 
 due to bad disk etc., we cannot restore the cluster, but we should not fail 
 the test in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)