[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266393#comment-17266393 ] Prathyusha commented on HBASE-24972: Yes, ConnectionLoss is what we get when we try to use a not-yet-connected zk. [ReadOnlyZKClient|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ReadOnlyZKClient.java] of hbase client uses [async apis|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ReadOnlyZKClient.java#L263] of Zookeeper and works with callbacks. So this does not need to wait explicitly for connection creation and is handled by the async setup of connection of Zookeeper. [RecoverableZooKeeper|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java] on the other hand uses [sync apis|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323] to fetch data from zookeeper and has to wait till the connection is created. It does so with an [exponential retry|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L319] if the zk conn is not yet up and [throws|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L331] a ConnectionLoss exception once the retries are exhausted. This client is used by region servers. Thanks. > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265718#comment-17265718 ] Michael Stack commented on HBASE-24972: --- Its a while since I looked in here but is ConnectionLoss what you get when you prematurely try to use a-not-yet-connected zk? My main concern is that we have done the async setup of the connection with a long time -- thats how zk does it -- and if it problematic, I'd have thought we'd have heard about it before this... Whatever the client is, can it wait on connection being up before it goes and does the getData? (If it an hbase client, these wait or not?). Thanks. > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265148#comment-17265148 ] Prathyusha commented on HBASE-24972: [~stack] Below is the stack trace of a failure incident we have seen - Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG StackTrace: org.apache.zookeeper.KeeperException.create(KeeperException.java:99) org.apache.zookeeper.KeeperException.create(KeeperException.java:51) org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1337) org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:625) ... StackTraceId: 429763122 But yes, I see the retries in place where ever we are doing write operations. [~sandeep.guggilam] These retries should suffice I guess. Any thoughts? > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264379#comment-17264379 ] Michael Stack commented on HBASE-24972: --- [~prathyu6] Any comment on above? > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262829#comment-17262829 ] Michael Stack commented on HBASE-24972: --- This change makes the connection synchronous instead of async. Do you have examples of failures seen? There is no provision for retry when connection is not yet up? Thanks. > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221018#comment-17221018 ] Prathyusha commented on HBASE-24972: Thanks [~sandeep.guggilam] > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Prathyusha >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220891#comment-17220891 ] Sandeep Guggilam commented on HBASE-24972: -- [~pratg] Sure , feel free to pick this one > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220549#comment-17220549 ] Prathyusha commented on HBASE-24972: [~sandeep.guggilam] If you have not already started working on this, can I pick this one? > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform > operations on ZK without waiting for the connection attempt to succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK
[ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187923#comment-17187923 ] Sandeep Guggilam commented on HBASE-24972: -- FYI [~apurtell] > Wait for connection attempt to succeed before performing operations on ZK > - > > Key: HBASE-24972 > URL: https://issues.apache.org/jira/browse/HBASE-24972 > Project: HBase > Issue Type: Bug >Reporter: Sandeep Guggilam >Assignee: Sandeep Guggilam >Priority: Minor > > {color:#1d1c1d}Creating the connection with ZK is asynchronous and notified > via the passed in watcher about the successful connection event. When we > attempt any operations, we try to create a connection and then perform a > read/write > ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) > without really waiting for the notification event > ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color} > > {color:#1d1c1d}it might be possible we get ConnectionLoss errors when we > perform operations on ZK without waiting for the connection attempt to > succeed{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)