[jira] [Updated] (HBASE-23764) Flaky tests due to ZK client name resolution delays

2020-04-07 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-23764:
-
Issue Type: Test  (was: Bug)

> Flaky tests due to ZK client name resolution delays
> ---
>
> Key: HBASE-23764
> URL: https://issues.apache.org/jira/browse/HBASE-23764
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: sample-jstacks.zip
>
>
> [~ndimiduk] and I ran into this issue (separately) and we noticed that there 
> are some performance issues with name resolution in the Zookeeper client. 
> Since we use ZK heavily in the unit tests, this often manifests as the 
> following issues 
> 1. Test time outs starting the mini cluster (Master failed to start)
> 2. InterruptedException (because the tests timeout)
> 3. Flaky tests because a subset of the cluster fails to start for whatever 
> reason (replication tests especially because they spawn multiple clusters).
> 4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?
> I have strong feeling that this is a possible cause for many of our flaky 
> tests in Jenkins. Luckily, it looks like the following workaround to switch 
> to an IP address instead of hostname makes it much quicker. There are some 
> related discussions in the ZK community (ZOOKEEPER-1666 and related jiras).
> {code:java}
> --- a/hbase-common/src/main/resources/hbase-default.xml
> +++ b/hbase-common/src/main/resources/hbase-default.xml
> @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the 
> important.
>
>
>  hbase.zookeeper.quorum
> -localhost
> +127.0.0.1
>  Comma separated list of servers in the ZooKeeper ensemble
>  (This config. should have been named hbase.zookeeper.ensemble).
>  For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> {code}
> Until we figure out the actual root cause and a dependency upgrade (if 
> needed), we should consider making this hostname to IP switch for more stable 
> builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23764) Flaky tests due to ZK client name resolution delays

2020-01-30 Thread Bharath Vissapragada (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada updated HBASE-23764:
-
Attachment: sample-jstacks.zip

> Flaky tests due to ZK client name resolution delays
> ---
>
> Key: HBASE-23764
> URL: https://issues.apache.org/jira/browse/HBASE-23764
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
> Attachments: sample-jstacks.zip
>
>
> [~ndimiduk] and I ran into this issue (separately) and we noticed that there 
> are some performance issues with name resolution in the Zookeeper client. 
> Since we use ZK heavily in the unit tests, this often manifests as the 
> following issues 
> 1. Test time outs starting the mini cluster (Master failed to start)
> 2. InterruptedException (because the tests timeout)
> 3. Flaky tests because a subset of the cluster fails to start for whatever 
> reason (replication tests especially because they spawn multiple clusters).
> 4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?
> I have strong feeling that this is a possible cause for many of our flaky 
> tests in Jenkins. Luckily, it looks like the following workaround to switch 
> to an IP address instead of hostname makes it much quicker. There are some 
> related discussions in the ZK community (ZOOKEEPER-1666 and related jiras).
> {code:java}
> --- a/hbase-common/src/main/resources/hbase-default.xml
> +++ b/hbase-common/src/main/resources/hbase-default.xml
> @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the 
> important.
>
>
>  hbase.zookeeper.quorum
> -localhost
> +127.0.0.1
>  Comma separated list of servers in the ZooKeeper ensemble
>  (This config. should have been named hbase.zookeeper.ensemble).
>  For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> {code}
> Until we figure out the actual root cause and a dependency upgrade (if 
> needed), we should consider making this hostname to IP switch for more stable 
> builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-23764) Flaky tests due to ZK client name resolution delays

2020-01-29 Thread Bharath Vissapragada (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada updated HBASE-23764:
-
Description: 
[~ndimiduk] and I ran into this issue (separately) and we noticed that there 
are some performance issues with name resolution in the Zookeeper client. Since 
we use ZK heavily in the unit tests, this often manifests as the following 
issues 

1. Test time outs starting the mini cluster (Master failed to start)
2. InterruptedException (because the tests timeout)
3. Flaky tests because a subset of the cluster fails to start for whatever 
reason (replication tests especially because they spawn multiple clusters).
4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?

I have strong feeling that this is a possible cause for many of our flaky tests 
in Jenkins. Luckily, it looks like the following workaround to switch to an IP 
address instead of hostname makes it much quicker. There are some related 
discussions in the ZK community (ZOOKEEPER-1666 and related jiras).


{code:java}
--- a/hbase-common/src/main/resources/hbase-default.xml
+++ b/hbase-common/src/main/resources/hbase-default.xml
@@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the 
important.
   
   
 hbase.zookeeper.quorum
-localhost
+127.0.0.1
 Comma separated list of servers in the ZooKeeper ensemble
 (This config. should have been named hbase.zookeeper.ensemble).
 For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
{code}


Until we figure out the actual root cause and a dependency upgrade (if needed), 
we should consider making this hostname to IP switch for more stable builds.

  was:
[~ndimiduk] and I ran into this issue (separately) and we noticed that there 
are some performance issues with name resolution in the Zookeeper client. Since 
we use ZK heavily in the unit tests, this often manifests as the following 
issues 

1. Test time outs starting the mini cluster (Master failed to start)
2. InterruptedException (because the tests timeout)
3. Flaky tests because a subset of the cluster fails to start for whatever 
reason (replication tests especially because they spawn multiple clusters).

I have strong feeling that this is a possible cause for many of our flaky tests 
in Jenkins. Luckily, it looks like the following workaround to switch to an IP 
address instead of hostname makes it much quicker. There are some related 
discussions in the ZK community (ZOOKEEPER-1666 and related jiras).

Until we figure out the actual root cause and a dependency upgrade (if needed), 
we should consider making this hostname to IP switch for more stable builds.


> Flaky tests due to ZK client name resolution delays
> ---
>
> Key: HBASE-23764
> URL: https://issues.apache.org/jira/browse/HBASE-23764
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Bharath Vissapragada
>Assignee: Bharath Vissapragada
>Priority: Major
>
> [~ndimiduk] and I ran into this issue (separately) and we noticed that there 
> are some performance issues with name resolution in the Zookeeper client. 
> Since we use ZK heavily in the unit tests, this often manifests as the 
> following issues 
> 1. Test time outs starting the mini cluster (Master failed to start)
> 2. InterruptedException (because the tests timeout)
> 3. Flaky tests because a subset of the cluster fails to start for whatever 
> reason (replication tests especially because they spawn multiple clusters).
> 4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?
> I have strong feeling that this is a possible cause for many of our flaky 
> tests in Jenkins. Luckily, it looks like the following workaround to switch 
> to an IP address instead of hostname makes it much quicker. There are some 
> related discussions in the ZK community (ZOOKEEPER-1666 and related jiras).
> {code:java}
> --- a/hbase-common/src/main/resources/hbase-default.xml
> +++ b/hbase-common/src/main/resources/hbase-default.xml
> @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the 
> important.
>
>
>  hbase.zookeeper.quorum
> -localhost
> +127.0.0.1
>  Comma separated list of servers in the ZooKeeper ensemble
>  (This config. should have been named hbase.zookeeper.ensemble).
>  For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> {code}
> Until we figure out the actual root cause and a dependency upgrade (if 
> needed), we should consider making this hostname to IP switch for more stable 
> builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)