[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-04-02 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all-v3.txt

Patch that removes the log statement Stack mentioned (had it in there for 
earlier debugging, forgot to remove it).

Also adds a simple test with an HConnection that is created before the 
mini-cluster is started to prove that initialization is indeed lazy.
(can't test with stopping and restarting the minicluster as new random ports 
are used each time).

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all-v3.txt, 5682-all.txt, 
 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-04-02 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all-v4.txt

I think this is as good as we can get in 0.94.
# Removed the exception handling from ensureZookeeperTrackers none of these 
methods throw.
# added getZookeeperWatcher to two methods that just need a ZKW.

The key is that an HConnection will never be left in a permanently useless 
state. Can file another jira for better timeouts.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all-v3.txt, 5682-all-v4.txt, 
 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Fix Version/s: (was: 0.94.1)
   0.94.0

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all.txt

Here's a patch that always attempts reconnecting to ZK when a ZK connection is 
needed.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Attachment: 5682-all-v2.txt

Found the problem.
The ClusterId could be remain null permanently if 
HConnection.getZookeeperWatcher() was called. That would initialize 
HConnectionImplementation.zookeeper, and hence not reset clusterid in 
ensureZookeeperTrackers.
TestZookeeper.testClientSessionExpired does that.

Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING 
rather than CONNECTED depending on timing.

Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, 
and zooKeeper volatile, because they can be modified by a different thread, but 
are not exclusively accessed in a synchronized block (exiting problem).

New patch that fixes the problem, passes all tests.

TestZookeeper seems to have good coverage. If I can think of more tests, I'll 
add them there.

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-31 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Priority: Critical  (was: Major)

Upped to critical. Without this the HBase client is pretty much useless in an 
AppServer setting where client can outlive the HBase cluster and ZK ensemble.
(Testing within the Salesforce AppServer is how I noticed the problem 
initially.)


 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-03-30 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5682:
-

Summary: Allow HConnectionImplementation to recover from ZK connection loss 
(for 0.94 only)  (was: Add retry logic in 
HConnectionImplementation#resetZooKeeperTrackers (for 0.94 only))

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.1

 Attachments: 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira