[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2021-01-15 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266393#comment-17266393
 ] 

Prathyusha commented on HBASE-24972:


Yes, ConnectionLoss is what we get when we try to use a not-yet-connected zk.

[ReadOnlyZKClient|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ReadOnlyZKClient.java]
 of hbase client uses [async 
apis|https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ReadOnlyZKClient.java#L263]
 of Zookeeper and works with callbacks. So this does not need to wait 
explicitly for connection creation and is handled by the async setup of 
connection of Zookeeper.

[RecoverableZooKeeper|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java]
 on the other hand uses [sync 
apis|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]
 to fetch data from zookeeper and has to wait till the connection is created. 
It does so with an [exponential 
retry|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L319]
 if the zk conn is not yet up and 
[throws|https://github.com/apache/hbase/blob/master/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L331]
 a ConnectionLoss exception once the retries are exhausted. This client is used 
by region servers. Thanks.

 

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2021-01-14 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265718#comment-17265718
 ] 

Michael Stack commented on HBASE-24972:
---

Its a while since I looked in here but is ConnectionLoss what you get when you 
prematurely try to use a-not-yet-connected zk?

My main concern is that we have done the async setup of the connection with a 
long time -- thats how zk does it -- and if it problematic, I'd have thought 
we'd have heard about it before this...  

Whatever the client is, can it wait on connection being up before it goes and 
does the getData? (If it an hbase client, these wait or not?). Thanks.

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2021-01-14 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265148#comment-17265148
 ] 

Prathyusha commented on HBASE-24972:


[~stack] Below is the stack trace of a failure incident we have seen -
Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG
StackTrace: 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1337)
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:625)
...
StackTraceId: 429763122
But yes, I see the retries in place where ever we are doing write operations. 
[~sandeep.guggilam] These retries should suffice I guess. Any thoughts?

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2021-01-13 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264379#comment-17264379
 ] 

Michael Stack commented on HBASE-24972:
---

[~prathyu6] Any comment on above?

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2021-01-11 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262829#comment-17262829
 ] 

Michael Stack commented on HBASE-24972:
---

This change makes the connection synchronous instead of async.

Do you have examples of failures seen?

There is no provision for retry when connection is not yet up?

Thanks.

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2020-10-26 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221018#comment-17221018
 ] 

Prathyusha commented on HBASE-24972:


Thanks [~sandeep.guggilam] 

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Prathyusha
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2020-10-26 Thread Sandeep Guggilam (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220891#comment-17220891
 ] 

Sandeep Guggilam commented on HBASE-24972:
--

[~pratg] Sure , feel free to pick this one

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2020-10-26 Thread Prathyusha (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220549#comment-17220549
 ] 

Prathyusha commented on HBASE-24972:


[~sandeep.guggilam] If you have not already started working on this, can I pick 
this one?

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform 
> operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

2020-08-31 Thread Sandeep Guggilam (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187923#comment-17187923
 ] 

Sandeep Guggilam commented on HBASE-24972:
--

FYI [~apurtell]

> Wait for connection attempt to succeed before performing operations on ZK
> -
>
> Key: HBASE-24972
> URL: https://issues.apache.org/jira/browse/HBASE-24972
> Project: HBase
>  Issue Type: Bug
>Reporter: Sandeep Guggilam
>Assignee: Sandeep Guggilam
>Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified 
> via the passed in watcher about the  successful connection event. When we 
> attempt any operations, we try to create a connection and then perform a 
> read/write 
> ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d})
>  without really waiting for the notification event 
> ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}it might be possible we get ConnectionLoss errors when we 
> perform operations on ZK without waiting for the connection attempt to 
> succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)