[jira] [Commented] (ZOOKEEPER-2230) Connections fo ZooKeeper server becomes slow over time with native GSSAPI

2022-08-05 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575889#comment-17575889
 ] 

Rajkiran Sura commented on ZOOKEEPER-2230:
--

Hi [~enixon] , [~enis] , [~symat] , This is still not patched in v3.7.1. Could 
you please merge this sometime now.

 

Regards,

Rajkiran

> Connections fo ZooKeeper server becomes slow over time with native GSSAPI
> -
>
> Key: ZOOKEEPER-2230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2230
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.5.0
> Environment: OS: RHEL6
> Java: 1.8.0_40
> Configuration:
> java.env:
> {noformat}
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Xmx5120m"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS 
> -Djava.security.auth.login.config=/local/apps/zookeeper-test1/conf/jaas-server.conf"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dsun.security.jgss.native=true"
> {noformat}
> jaas-server.conf:
> {noformat}
> Server {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> isInitiator=false
> principal="zookeeper/@";
> };
> {noformat}
> Process environment:
> {noformat}
> KRB5_KTNAME=/local/apps/zookeeper-test1/conf/keytab
> ZOO_LOG_DIR=/local/apps/zookeeper-test1/log
> ZOOCFGDIR=/local/apps/zookeeper-test1/conf
> {noformat}
>Reporter: Deepesh Reja
>Assignee: Enis Soztutar
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 3.4.6, 3.4.7, 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2230.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ZooKeeper server becomes slow over time when native GSSAPI is used. The 
> connection to the server starts taking upto 10 seconds.
> This is happening with ZooKeeper-3.4.6 and is fairly reproducible.
> Debug logs:
> {noformat}
> 2015-07-02 00:58:49,318 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /:47942
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@78] - 
> serviceHostname is ''
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@79] - 
> servicePrincipalName is 'zookeeper'
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@80] - SASL 
> mechanism(mech) is 'GSSAPI'
> 2015-07-02 00:58:49,324 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@106] - Added 
> private credential to subject: [GSSCredential: 
> zookeeper@ 1.2.840.113554.1.2.2 Accept [class 
> sun.security.jgss.wrapper.GSSCredElement]]
> 2015-07-02 00:58:59,441 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@810] - Session 
> establishment request from client /:47942 client's lastZxid is 0x0
> 2015-07-02 00:58:59,441 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@868] - Client 
> attempting to establish new session at /:47942
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x14e486028785c81 type:createSession cxid:0x0 zxid:0x110e79 
> txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x14e486028785c81 
> type:createSession cxid:0x0 zxid:0x110e79 txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - 
> Established session 0x14e486028785c81 with negotiated timeout 1 for 
> client /:47942
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 706
> 2015-07-02 00:58:59,460 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 161
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 0
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 32
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> 

[jira] [Commented] (ZOOKEEPER-2230) Connections fo ZooKeeper server becomes slow over time with native GSSAPI

2021-05-12 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17343117#comment-17343117
 ] 

Rajkiran Sura commented on ZOOKEEPER-2230:
--

Hi [~enixon] , [~enis], just checking if you got a chance to check last few 
messages in the thread here, regarding the patch working/validation. Would be 
great if the patch can be merged.

 

Thanks again,

Rajkiran

> Connections fo ZooKeeper server becomes slow over time with native GSSAPI
> -
>
> Key: ZOOKEEPER-2230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2230
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.5.0
> Environment: OS: RHEL6
> Java: 1.8.0_40
> Configuration:
> java.env:
> {noformat}
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Xmx5120m"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS 
> -Djava.security.auth.login.config=/local/apps/zookeeper-test1/conf/jaas-server.conf"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dsun.security.jgss.native=true"
> {noformat}
> jaas-server.conf:
> {noformat}
> Server {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> isInitiator=false
> principal="zookeeper/@";
> };
> {noformat}
> Process environment:
> {noformat}
> KRB5_KTNAME=/local/apps/zookeeper-test1/conf/keytab
> ZOO_LOG_DIR=/local/apps/zookeeper-test1/log
> ZOOCFGDIR=/local/apps/zookeeper-test1/conf
> {noformat}
>Reporter: Deepesh Reja
>Assignee: Enis Soztutar
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 3.4.6, 3.4.7, 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2230.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ZooKeeper server becomes slow over time when native GSSAPI is used. The 
> connection to the server starts taking upto 10 seconds.
> This is happening with ZooKeeper-3.4.6 and is fairly reproducible.
> Debug logs:
> {noformat}
> 2015-07-02 00:58:49,318 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /:47942
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@78] - 
> serviceHostname is ''
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@79] - 
> servicePrincipalName is 'zookeeper'
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@80] - SASL 
> mechanism(mech) is 'GSSAPI'
> 2015-07-02 00:58:49,324 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@106] - Added 
> private credential to subject: [GSSCredential: 
> zookeeper@ 1.2.840.113554.1.2.2 Accept [class 
> sun.security.jgss.wrapper.GSSCredElement]]
> 2015-07-02 00:58:59,441 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@810] - Session 
> establishment request from client /:47942 client's lastZxid is 0x0
> 2015-07-02 00:58:59,441 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@868] - Client 
> attempting to establish new session at /:47942
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x14e486028785c81 type:createSession cxid:0x0 zxid:0x110e79 
> txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x14e486028785c81 
> type:createSession cxid:0x0 zxid:0x110e79 txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - 
> Established session 0x14e486028785c81 with negotiated timeout 1 for 
> client /:47942
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 706
> 2015-07-02 00:58:59,460 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 161
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 0
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 32
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 

[jira] [Commented] (ZOOKEEPER-2230) Connections fo ZooKeeper server becomes slow over time with native GSSAPI

2020-11-04 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226552#comment-17226552
 ] 

Rajkiran Sura commented on ZOOKEEPER-2230:
--

Hi [~enixon], Just checking if you got a chance to check my last three 
comments, that, the patch works for us. Would be great if it can make into 
future release.

 

Thanks a lot,

Rajkiran

> Connections fo ZooKeeper server becomes slow over time with native GSSAPI
> -
>
> Key: ZOOKEEPER-2230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2230
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.5.0
> Environment: OS: RHEL6
> Java: 1.8.0_40
> Configuration:
> java.env:
> {noformat}
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Xmx5120m"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS 
> -Djava.security.auth.login.config=/local/apps/zookeeper-test1/conf/jaas-server.conf"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dsun.security.jgss.native=true"
> {noformat}
> jaas-server.conf:
> {noformat}
> Server {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> isInitiator=false
> principal="zookeeper/@";
> };
> {noformat}
> Process environment:
> {noformat}
> KRB5_KTNAME=/local/apps/zookeeper-test1/conf/keytab
> ZOO_LOG_DIR=/local/apps/zookeeper-test1/log
> ZOOCFGDIR=/local/apps/zookeeper-test1/conf
> {noformat}
>Reporter: Deepesh Reja
>Assignee: Enis Soztutar
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 3.4.6, 3.4.7, 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2230.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ZooKeeper server becomes slow over time when native GSSAPI is used. The 
> connection to the server starts taking upto 10 seconds.
> This is happening with ZooKeeper-3.4.6 and is fairly reproducible.
> Debug logs:
> {noformat}
> 2015-07-02 00:58:49,318 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /:47942
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@78] - 
> serviceHostname is ''
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@79] - 
> servicePrincipalName is 'zookeeper'
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@80] - SASL 
> mechanism(mech) is 'GSSAPI'
> 2015-07-02 00:58:49,324 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@106] - Added 
> private credential to subject: [GSSCredential: 
> zookeeper@ 1.2.840.113554.1.2.2 Accept [class 
> sun.security.jgss.wrapper.GSSCredElement]]
> 2015-07-02 00:58:59,441 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@810] - Session 
> establishment request from client /:47942 client's lastZxid is 0x0
> 2015-07-02 00:58:59,441 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@868] - Client 
> attempting to establish new session at /:47942
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x14e486028785c81 type:createSession cxid:0x0 zxid:0x110e79 
> txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x14e486028785c81 
> type:createSession cxid:0x0 zxid:0x110e79 txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - 
> Established session 0x14e486028785c81 with negotiated timeout 1 for 
> client /:47942
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 706
> 2015-07-02 00:58:59,460 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 161
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 0
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 32
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> 

[jira] [Commented] (ZOOKEEPER-3824) ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum authn/z

2020-05-28 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118482#comment-17118482
 ] 

Rajkiran Sura commented on ZOOKEEPER-3824:
--

Hello,

Just checking if anyone got a chance to check this.

Thanks,

Rajkiran

> ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum 
> authn/z
> ---
>
> Key: ZOOKEEPER-3824
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3824
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: kerberos, leaderElection, quorum, server
>Affects Versions: 3.5.6
> Environment: O.S. :- RHEL7
>Reporter: Rajkiran Sura
>Priority: Major
>
> With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added 
> and removed without restarting ZooKeeper service on any of the nodes.
> But, with Keberos (GSSAPI via SASL) enabled quorum 
> authentication/authorization, this is not possible. Because, when you try to 
> add a new server, it won't be able to connect to any of the members in the 
> ensemble and the data won't be synced. This is because all the members reject 
> it based on authorization. For this to make it work, we need to do 
> 'reconfig', then restart leader, the new member and rest of the members.
> Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
> missing something here.
> This is our basic quorum-auth config:
> {quote}quorum.auth.serverRequireSasl=true
>  quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
>  quorum.auth.enableSasl=true
>  quorum.auth.learner.saslLoginContext=QuorumLearner
>  quorum.auth.learnerRequireSasl=true
>  quorum.cnxn.threads.size=20
>  quorum.auth.server.saslLoginContext=QuorumServer
> {quote}
> FTR: I raised this question in [ZooKeeper-user 
> forum|http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]
>  and both Mate and Enrico suspect this to be a bug.
> Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled 
> quorum based ensemble.
>  
> Regards,
> Rajkiran
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-19 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111298#comment-17111298
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

> I created a PR ([https://github.com/apache/zookeeper/pull/1356]) 

That's great!

> Could you please share the sequence of steps you were executing when you saw 
> the original issue?

I also used the exact sequence of steps that you have described in the above 
comment. Just one minor correction in the last step, we just restart the 
service on server.6 as the it has already been started with new config.

Regards,

Rajkiran

 

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
> {{ at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
> election address 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-15 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108856#comment-17108856
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

{quote}but in the meanwhile I recommend using dynamic reconfig to change the 
quorum.
{quote}
Yes, we started to rely on dynamic-reconfig. But, I would like to note that 
dynamic-reconfig isn't really dynamic when you have Quorum auth enabled with 
GSSAPI via SASL. i.e., the config is changed but the new member doesn't join 
the ensemble until all the members are restarted. Thus, its no more dynamic. 
Looks more scarier.

FTR: I have raised https://issues.apache.org/jira/browse/ZOOKEEPER-3824 for 
this issue.

Thanks Mate.

Regards,

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
> {{ at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{2020-05-02 01:43:03,026 [myid:23] - 

[jira] [Commented] (ZOOKEEPER-3824) ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum authn/z

2020-05-15 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108266#comment-17108266
 ] 

Rajkiran Sura commented on ZOOKEEPER-3824:
--

Tagging [~symat] [~shralex] [~hanm] [~eolivelli] if they have any thoughts wrt 
this issue.

 

Thanks,

Rajkiran

> ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum 
> authn/z
> ---
>
> Key: ZOOKEEPER-3824
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3824
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: kerberos, leaderElection, quorum, server
>Affects Versions: 3.5.6
> Environment: O.S. :- RHEL7
>Reporter: Rajkiran Sura
>Priority: Major
>
> With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added 
> and removed without restarting ZooKeeper service on any of the nodes.
> But, with Keberos (GSSAPI via SASL) enabled quorum 
> authentication/authorization, this is not possible. Because, when you try to 
> add a new server, it won't be able to connect to any of the members in the 
> ensemble and the data won't be synced. This is because all the members reject 
> it based on authorization. For this to make it work, we need to do 
> 'reconfig', then restart leader, the new member and rest of the members.
> Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
> missing something here.
> This is our basic quorum-auth config:
> {quote}quorum.auth.serverRequireSasl=true
>  quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
>  quorum.auth.enableSasl=true
>  quorum.auth.learner.saslLoginContext=QuorumLearner
>  quorum.auth.learnerRequireSasl=true
>  quorum.cnxn.threads.size=20
>  quorum.auth.server.saslLoginContext=QuorumServer
> {quote}
> FTR: I raised this question in [ZooKeeper-user 
> forum|http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]
>  and both Mate and Enrico suspect this to be a bug.
> Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled 
> quorum based ensemble.
>  
> Regards,
> Rajkiran
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-15 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108259#comment-17108259
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Many thanks Mate, for looking into this. Glad that you could pin-point the 
problem.

 

Regards,

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
> {{ at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
> election address node5.bar.com:3888}}
> {{java.net.UnknownHostException: node5.bar.com}}
> {{ at 
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}}
> {{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}}
> {{ at java.base/java.net.Socket.connect(Socket.java:591)}}
> {{ at 
> 

[jira] [Updated] (ZOOKEEPER-3824) ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum authn/z

2020-05-11 Thread Rajkiran Sura (Jira)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajkiran Sura updated ZOOKEEPER-3824:
-
Description: 
With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added and 
removed without restarting ZooKeeper service on any of the nodes.

But, with Keberos (GSSAPI via SASL) enabled quorum 
authentication/authorization, this is not possible. Because, when you try to 
add a new server, it won't be able to connect to any of the members in the 
ensemble and the data won't be synced. This is because all the members reject 
it based on authorization. For this to make it work, we need to do 'reconfig', 
then restart leader, the new member and rest of the members.

Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
missing something here.

This is our basic quorum-auth config:
{quote}quorum.auth.serverRequireSasl=true
 quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
 quorum.auth.enableSasl=true
 quorum.auth.learner.saslLoginContext=QuorumLearner
 quorum.auth.learnerRequireSasl=true
 quorum.cnxn.threads.size=20
 quorum.auth.server.saslLoginContext=QuorumServer
{quote}
FTR: I raised this question in [ZooKeeper-user 
forum|http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]
 and both Mate and Enrico suspect this to be a bug.

Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled quorum 
based ensemble.

 

Regards,

Rajkiran

 

  was:
With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added and 
removed without restarting ZooKeeper service on any of the nodes.

But, with Keberos (GSSAPI via SASL) enabled quorum 
authentication/authorization, this is not possible. Because, when you try to 
add a new server, it won't be able to connect to any of the members in the 
ensemble and the data won't be synced. This is because all the members reject 
it based on authorization. For this to make it work, we need to do 'reconfig', 
then restart leader, the new member and rest of the members.

Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
missing something here.

This is our basic quorum-auth config:
{quote}quorum.auth.serverRequireSasl=true
quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
quorum.auth.enableSasl=true
quorum.auth.learner.saslLoginContext=QuorumLearner
quorum.auth.learnerRequireSasl=true
quorum.cnxn.threads.size=20
quorum.auth.server.saslLoginContext=QuorumServer
{quote}
FTR: I raised this question in [ZooKeeper-user 
forum|[http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]]
 and both Mate and Enrico suspect this to be a bug.

Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled quorum 
based ensemble.

 

Regards,

Rajkiran

 


> ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum 
> authn/z
> ---
>
> Key: ZOOKEEPER-3824
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3824
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: kerberos, leaderElection, quorum, server
>Affects Versions: 3.5.6
> Environment: O.S. :- RHEL7
>Reporter: Rajkiran Sura
>Priority: Major
>
> With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added 
> and removed without restarting ZooKeeper service on any of the nodes.
> But, with Keberos (GSSAPI via SASL) enabled quorum 
> authentication/authorization, this is not possible. Because, when you try to 
> add a new server, it won't be able to connect to any of the members in the 
> ensemble and the data won't be synced. This is because all the members reject 
> it based on authorization. For this to make it work, we need to do 
> 'reconfig', then restart leader, the new member and rest of the members.
> Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
> missing something here.
> This is our basic quorum-auth config:
> {quote}quorum.auth.serverRequireSasl=true
>  quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
>  quorum.auth.enableSasl=true
>  quorum.auth.learner.saslLoginContext=QuorumLearner
>  quorum.auth.learnerRequireSasl=true
>  quorum.cnxn.threads.size=20
>  quorum.auth.server.saslLoginContext=QuorumServer
> {quote}
> FTR: I raised this question in [ZooKeeper-user 
> forum|http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]
>  and both Mate and Enrico suspect this to be a bug.
> Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled 
> quorum based ensemble.
>  
> Regards,
> Rajkiran
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3824) ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL enabled Quorum authn/z

2020-05-11 Thread Rajkiran Sura (Jira)
Rajkiran Sura created ZOOKEEPER-3824:


 Summary: ZooKeeper dynamic reconfig doesn't work with GSSAPI/SASL 
enabled Quorum authn/z
 Key: ZOOKEEPER-3824
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3824
 Project: ZooKeeper
  Issue Type: Bug
  Components: kerberos, leaderElection, quorum, server
Affects Versions: 3.5.6
 Environment: O.S. :- RHEL7
Reporter: Rajkiran Sura


With 'DynamicReconfig' feature in v3.5.6, ideally the servers can be added and 
removed without restarting ZooKeeper service on any of the nodes.

But, with Keberos (GSSAPI via SASL) enabled quorum 
authentication/authorization, this is not possible. Because, when you try to 
add a new server, it won't be able to connect to any of the members in the 
ensemble and the data won't be synced. This is because all the members reject 
it based on authorization. For this to make it work, we need to do 'reconfig', 
then restart leader, the new member and rest of the members.

Is this the expected behavior with Quorum-auth + DynamicReconfig? Or am I 
missing something here.

This is our basic quorum-auth config:
{quote}quorum.auth.serverRequireSasl=true
quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
quorum.auth.enableSasl=true
quorum.auth.learner.saslLoginContext=QuorumLearner
quorum.auth.learnerRequireSasl=true
quorum.cnxn.threads.size=20
quorum.auth.server.saslLoginContext=QuorumServer
{quote}
FTR: I raised this question in [ZooKeeper-user 
forum|[http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-dynamic-reconfig-issue-when-Quorum-authn-authz-is-enabled-td7584927.html]]
 and both Mate and Enrico suspect this to be a bug.

Also this is easily reproducible in a Kerbers (GSSAPI via SASL) enabled quorum 
based ensemble.

 

Regards,

Rajkiran

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-11 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17104188#comment-17104188
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

{quote}The existence of the .next file indicates that a reconfiguration was 
halted in the middle, before completing. 
{quote}
Also, as I mentioned earlier, we had initially not enabled dynamicReconfig, so 
not sure why the "dynamic.next" was coming into picture.
{quote}The standard way of changing the id would be removing the old id from 
the cluster and adding the new one using one or more reconfig commands.
{quote}
FTR: We were trying to achieve this via legacy rolling restarts method. i.e., 
first remove old ID, do a rolling restart. Then, add new ID, do a rolling 
restart. This worked for us perfectly fine(as in the newly added ID joined the 
cluster and was serving upto-date data). But, then when a ZooKeeper failover 
happened and this newly added ID became leader, we had problems (i.e., none of 
the ZooKeeper members were serving the clients).

 

Thanks Mate and Alexander for looking into this.

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-08 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102671#comment-17102671
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Hi Mate,

Many thanks for looking further into this.
{quote}If by any chance you still have (and you have the permission to share) 
the full server logs from all the servers during the time when you changed the 
hostname of the last node, I would be happy to take a look.
{quote}
Unfortunately the logs have rolled back. Also, when we changed the hostname, it 
was able to join the cluster without any issues after we did a rolling restart. 
And was also serving clients without any issues for a week. Then, when next 
leader election happened, it got elected as leader and we had trouble serving 
the clients.
{quote}I have a docker environment 
([https://github.com/symat/zookeeper-docker-test]) where I tried to create a 
cluster and simulating the config change you did.
{quote}
Just checking, if you simulated the removal and addition of server via legacy 
rolling-restarts method? Also, we have quorum authn/authz enabled.

FWIW: Even I tried simulating this afresh using a 3-node v3.5.6 cluster. But, 
wasn't able to reproduce it exactly.

Also, not sure if this makes any difference but the production cluster was 
upgraded from v3.4.8 to v3.5.6. But, for my reproducer/simulation I directly 
initialized it with v3.5.6.

 

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-08 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102321#comment-17102321
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Hi Mate,

Yes, as mentioned in first update, we have kept both the myid and config 
in-sync with the changes.
{quote}server.17=node1.foo.bar.com:2888:3888;2181
server.19=node2.foo.bar.com:2888:3888;2181
server.20=node3.foo.bar.com:2888:3888;2181
server.21=node4.foo.bar.com:2888:3888;2181
*server.{color:#FF}23{color}=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
{quote}
**Also, if there were to be a mismatch between ID in myid and config, the 
ZooKeeper wouldn't even start-up. In our case, it was able to join quorum and 
sync data. And was also serving the clients. But, had trouble when it was 
nominated as leader.

Thanks,

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-06 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100922#comment-17100922
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Hi Mate,

Thanks for your reply. Yes, I did change the ID in the 'myid' file. As per 
[~eolivelli] suggestion in 
[here|[http://zookeeper-user.578899.n2.nabble.com/ZooKeeper-config-caching-issues-td7584905.html]],
 we did add/remove nodes the traditional way (as we had disabled 
dynamic-reconfig since the beginning) and that did not help. i.e., the new node 
was able to join the cluster but the cluster was unresponsive when it became 
the leader. So, we had to finally enable and use dynamic-reconfig to fix the 
problem.

So, this definitely looks like a bug in some corner which is hard-coded/told to 
look only for dynamicConfig?

Thanks a lot,

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Assignee: Mate Szalay-Beko
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097875#comment-17097875
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

FTR: We haven't enabled dynamic reconfig at all.

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
> {{ at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
> election address node5.bar.com:3888}}
> {{java.net.UnknownHostException: node5.bar.com}}
> {{ at 
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}}
> {{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}}
> {{ at java.base/java.net.Socket.connect(Socket.java:591)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097874#comment-17097874
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Latest observation, we noticed that ZooKeeper was complaining about 
dynamic.next file, event though we HAVE NOT ENABLED dynamic-reconfiguration.
{quote}2020-05-02 01:43:05,870 [myid:21] - ERROR 
[QuorumPeer[myid=21](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1637] - 
Error writing next dynamic config file to disk:
{quote}
And zookeeper user did not have perms to that config directory, so we fixed 
that restarted zookeeper. And then it dumped below dynamic.next, which contains 
the OLD migrated node as a member :O
{quote}$ sudo cat /opt/zookeeper/conf/zoo.cfg.dynamic.next
server.17=node1.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.19=node2.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.20=node3.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.21=node4.foo.bar.com:2888:3888:participant;0.0.0.0:2181
*server.{color:#de350b}22=node5.bar.com{color}*:2888:3888:participant;0.0.0.0:2181
{quote}
So, this looks like a bug. And from where is it still fetching this? How do we 
fix it.

Any lead/help is very much appreciated.

 

Thanks in advance, 

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> 

[jira] [Created] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)
Rajkiran Sura created ZOOKEEPER-3814:


 Summary: ZooKeeper caching of config
 Key: ZOOKEEPER-3814
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.6
Reporter: Rajkiran Sura


Hello,

We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
Encountered no issues as such.

This is how the ZooKeeper config looks like:
{quote}tickTime=2000
dataDir=/zookeeper-data/
initLimit=5
syncLimit=2
maxClientCnxns=2048
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
4lw.commands.whitelist=stat, ruok, conf, isro, mntr
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
requireClientAuthScheme=sasl
quorum.cnxn.threads.size=20
quorum.auth.enableSasl=true
quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
quorum.auth.learnerRequireSasl=true
quorum.auth.learner.saslLoginContext=QuorumLearner
quorum.auth.serverRequireSasl=true
quorum.auth.server.saslLoginContext=QuorumServer
server.17=node1.foo.bar.com:2888:3888;2181
server.19=node2.foo.bar.com:2888:3888;2181
server.20=node3.foo.bar.com:2888:3888;2181
server.21=node4.foo.bar.com:2888:3888;2181
server.22=node5.bar.com:2888:3888;2181
{quote}
Post upgrade, we had to migrate server.22 on the same node, but with 
*FOO*.bar.com domain name due to kerberos referral issues. And, we used 
different server-identifier, i.e., *23* when we migrated. So, here is how the 
new config looked like:
{quote}server.17=node1.foo.bar.com:2888:3888;2181
server.19=node2.foo.bar.com:2888:3888;2181
server.20=node3.foo.bar.com:2888:3888;2181
server.21=node4.foo.bar.com:2888:3888;2181
*server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
{quote}
We restarted all the nodes in the ensemble with the above updated config. And 
the migrated node joined the quorum successfully and was serving all clients 
directly connected to it, without any issues.

Recently, when a leader election happened, 
server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
highest ID). But then, ZooKeeper was unable to serve any clients and *all* the 
servers were _somehow still_ trying to establish a channel to 22 (old DNS name: 
node5.bar.com) and were throwing below error in a loop:
{quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
[WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
address: node4.bar.com}}
{{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
{{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
{{ at 
java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
{{ at 
java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
{{ at 
java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
{{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
{{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
{{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
{{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
{{ at java.base/java.lang.Thread.run(Thread.java:834)}}
{{2020-05-02 01:43:03,026 [myid:23] - WARN 
[WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
election address node5.bar.com:3888}}
{{java.net.UnknownHostException: node5.bar.com}}
{{ at 
java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}}
{{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}}
{{ at java.base/java.net.Socket.connect(Socket.java:591)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:714)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
{{ at 

[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2020-01-02 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006728#comment-17006728
 ] 

Rajkiran Sura commented on ZOOKEEPER-1875:
--

Hi [~jerryhe],

Could you please at least generate a patch that is likely to be merged in 
future releases?

Thanks,

Rajkiran

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Fix For: 3.5.7, 3.7.0
>
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-1875) NullPointerException in ClientCnxn$EventThread.processEvent

2019-12-25 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003464#comment-17003464
 ] 

Rajkiran Sura commented on ZOOKEEPER-1875:
--

Hi [~jerryhe],

Could you please at least generate a patch that is likely to be merged in 
future releases?

Thanks,

Rajkiran

 

> NullPointerException in ClientCnxn$EventThread.processEvent
> ---
>
> Key: ZOOKEEPER-1875
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1875
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.5, 3.4.10
>Reporter: Jerry He
>Assignee: Jerry He
>Priority: Minor
> Fix For: 3.5.7, 3.7.0
>
> Attachments: ZOOKEEPER-1875-trunk.patch, ZOOKEEPER-1875.patch, 
> ZOOKEEPER-1875.patch
>
>
> We've been seeing NullPointerException while working on HBase:
> {code}
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Client 
> environment:user.dir=/home/biadmin/hbase-trunk
> 14/01/30 22:15:25 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=hdtest009:2181 sessionTimeout=9 watcher=null
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Opening socket connection to 
> server hdtest009/9.30.194.18:2181. Will not attempt to authenticate using 
> SASL (Unable to locate a login configuration)
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Socket connection established to 
> hdtest009/9.30.194.18:2181, initiating session
> 14/01/30 22:15:25 INFO zookeeper.ClientCnxn: Session establishment complete 
> on server hdtest009/9.30.194.18:2181, sessionid = 0x143986213e67e48, 
> negotiated timeout = 6
> 14/01/30 22:15:25 ERROR zookeeper.ClientCnxn: Error while calling watcher
> java.lang.NullPointerException
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> {code}
> The reason is the watcher is null in this part of the code:
> {code}
>private void processEvent(Object event) {
>   try {
>   if (event instanceof WatcherSetEventPair) {
>   // each watcher will process the event
>   WatcherSetEventPair pair = (WatcherSetEventPair) event;
>   for (Watcher watcher : pair.watchers) {
>   try {
>   watcher.process(pair.event);
>   } catch (Throwable t) {
>   LOG.error("Error while calling watcher ", t);
>   }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-2108) Compilation error in ZkAdaptor.cc with GCC 4.7 or later

2019-10-11 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949529#comment-16949529
 ] 

Rajkiran Sura commented on ZOOKEEPER-2108:
--

Yes, this still does exist in 3.5.5 branch too. Also, apart from that I am 
encountering additional issue as below:

 

I have used multiple versions of GCC and the issue exists in both the cases 
(gcc-4.8.5 and gcc-7.3.1).

I have followed the instructions given in README.txt. Also, due to 
restructuring in 3.5.5, I had to update ZOOKEEPER_PATH in configure.ac as below:
{code:java}
# Zookeeper C client
ZOOKEEPER_PATH=${BUILD_PATH}/../../zookeeper-client/zookeeper-client-c
{code}
I am encountering these errors during "make" phase:
{code:java}
$ make
make all-recursive
make[1]: Entering directory 
'/tmp/bar2/zookeeper-3.5.5-SNAPSHOT/zookeeper-contrib/zookeeper-contrib-zktreeutil'
Making all in src
make[2]: Entering directory 
'/tmp/bar2/zookeeper-3.5.5-SNAPSHOT/zookeeper-contrib/zookeeper-contrib-zktreeutil/src'
g++ -DHAVE_CONFIG_H -I. -I.. 
-I/tmp/bar2/zookeeper-3.5.5-SNAPSHOT/zookeeper-contrib/zookeeper-contrib-zktreeutil/../../zookeeper-client/zookeeper-client-c/include
 -I/tmp/bar2/zookeeper-3.5.5-SNAPSHOT/zookeeper-contrib/zookeeper-contri
b-zktreeutil/../../zookeeper-client/zookeeper-client-c/generated -I../include 
-I/usr/local/include -I/usr/include -I/usr/include/libxml2 -g -O2 -MT 
ZkAdaptor.o -MD -MP -MF .deps/ZkAdaptor.Tpo -c -o ZkAdaptor.o ZkAdaptor.cc
ZkAdaptor.cc: In member function ‘bool 
zktreeutil::ZooKeeperAdapter::createNode(const string&, const string&, int, 
bool)’:
ZkAdaptor.cc:276:18: error: ‘zoo_create’ was not declared in this scope
 rc = zoo_create( mp_zkHandle,
 ^~
ZkAdaptor.cc:276:18: note: suggested alternative: ‘zoo_create’
 rc = zoo_create( mp_zkHandle,
 ^~
 zoo_create
ZkAdaptor.cc: At global scope:
ZkAdaptor.cc:334:26: warning: dynamic exception specifications are deprecated 
in C++11 [-Wdeprecated]
 int version) throw(ZooKeeperException)
 ^
ZkAdaptor.cc: In member function ‘bool 
zktreeutil::ZooKeeperAdapter::deleteNode(const string&, bool, int)’:
ZkAdaptor.cc:344:18: error: ‘zoo_delete’ was not declared in this scope
 rc = zoo_delete( mp_zkHandle, path.c_str(), version );
 ^~
ZkAdaptor.cc:344:18: note: suggested alternative: ‘zoo_delete’
 rc = zoo_delete( mp_zkHandle, path.c_str(), version );
 ^~
 zoo_delete
ZkAdaptor.cc: At global scope:
ZkAdaptor.cc:383:77: warning: dynamic exception specifications are deprecated 
in C++11 [-Wdeprecated]
 vector< string > ZooKeeperAdapter::getNodeChildren (const string ) throw 
(ZooKeeperException)
 ^
ZkAdaptor.cc: In member function ‘std::vector > 
zktreeutil::ZooKeeperAdapter::getNodeChildren(const string&)’:
ZkAdaptor.cc:395:18: error: ‘zoo_get_children’ was not declared in this scope
 rc = zoo_get_children( mp_zkHandle,
 ^~~~
ZkAdaptor.cc:395:18: note: suggested alternative: ‘zoo_get_children’
 rc = zoo_get_children( mp_zkHandle,
 ^~~~
 zoo_get_children
{code}
Its weird because, "zookeeper.h" is accessible and these are still defined in 
there.

Am I missing something here? Thanks!

> Compilation error in ZkAdaptor.cc with GCC 4.7 or later
> ---
>
> Key: ZOOKEEPER-2108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2108
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.6
>Reporter: Emmanuel Bourg
>Priority: Minor
>
> Hi,
> Debian and Fedora have a patch fixing a compilation failure in ZkAdaptor.cc 
> but it doesn't appear to be fixed in the upcoming version 3.5.0. This issue 
> is similar to ZOOKEEPER-470 and ZOOKEEPER-1795.
> The error is :
> {code}
> g++ -DHAVE_CONFIG_H -I. -I..   -D_FORTIFY_SOURCE=2 
> -I/home/ebourg/packaging/zookeeper/src/contrib/zktreeutil/../../c/include 
> -I/home/ebourg/packaging/zookeeper/src/contrib/zktreeutil/../../c/generated 
> -I../include -I/usr/local/include -I/usr/include -I/usr/include/libxml2 -g 
> -O2 -fstack-protector-strong -Wformat -Werror=format-security -MT ZkAdaptor.o 
> -MD -MP -MF .deps/ZkAdaptor.Tpo -c -o ZkAdaptor.o ZkAdaptor.cc
> ZkAdaptor.cc: In member function ‘void 
> zktreeutil::ZooKeeperAdapter::reconnect()’:
> ZkAdaptor.cc:220:21: error: ‘sleep’ was not declared in this scope
>  sleep (1);
> {code}
> This is fixed by including unistd.h in ZkAdaptor.cc or  ZkAdaptor.h
> The Debian patch:
> https://sources.debian.net/src/zookeeper/3.4.5%2Bdfsg-2/debian/patches/ftbfs-gcc-4.7.diff/
> and the Fedora patch:
> http://pkgs.fedoraproject.org/cgit/zookeeper.git/tree/zookeeper-3.4.5-zktreeutil-gcc.patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-2230) Connections fo ZooKeeper server becomes slow over time with native GSSAPI

2019-10-10 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948319#comment-16948319
 ] 

Rajkiran Sura commented on ZOOKEEPER-2230:
--

Hi [~fittey], Just checking if you got a chance to check my update above. 
Thanks!

> Connections fo ZooKeeper server becomes slow over time with native GSSAPI
> -
>
> Key: ZOOKEEPER-2230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2230
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.5.0
> Environment: OS: RHEL6
> Java: 1.8.0_40
> Configuration:
> java.env:
> {noformat}
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Xmx5120m"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS 
> -Djava.security.auth.login.config=/local/apps/zookeeper-test1/conf/jaas-server.conf"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dsun.security.jgss.native=true"
> {noformat}
> jaas-server.conf:
> {noformat}
> Server {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> isInitiator=false
> principal="zookeeper/@";
> };
> {noformat}
> Process environment:
> {noformat}
> KRB5_KTNAME=/local/apps/zookeeper-test1/conf/keytab
> ZOO_LOG_DIR=/local/apps/zookeeper-test1/log
> ZOOCFGDIR=/local/apps/zookeeper-test1/conf
> {noformat}
>Reporter: Deepesh Reja
>Assignee: Enis Soztutar
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 3.4.6, 3.4.7, 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ZooKeeper server becomes slow over time when native GSSAPI is used. The 
> connection to the server starts taking upto 10 seconds.
> This is happening with ZooKeeper-3.4.6 and is fairly reproducible.
> Debug logs:
> {noformat}
> 2015-07-02 00:58:49,318 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /:47942
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@78] - 
> serviceHostname is ''
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@79] - 
> servicePrincipalName is 'zookeeper'
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@80] - SASL 
> mechanism(mech) is 'GSSAPI'
> 2015-07-02 00:58:49,324 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@106] - Added 
> private credential to subject: [GSSCredential: 
> zookeeper@ 1.2.840.113554.1.2.2 Accept [class 
> sun.security.jgss.wrapper.GSSCredElement]]
> 2015-07-02 00:58:59,441 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@810] - Session 
> establishment request from client /:47942 client's lastZxid is 0x0
> 2015-07-02 00:58:59,441 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@868] - Client 
> attempting to establish new session at /:47942
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x14e486028785c81 type:createSession cxid:0x0 zxid:0x110e79 
> txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x14e486028785c81 
> type:createSession cxid:0x0 zxid:0x110e79 txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - 
> Established session 0x14e486028785c81 with negotiated timeout 1 for 
> client /:47942
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 706
> 2015-07-02 00:58:59,460 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 161
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 0
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 32
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 32
> 

[jira] [Commented] (ZOOKEEPER-2230) Connections fo ZooKeeper server becomes slow over time with native GSSAPI

2019-09-25 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937930#comment-16937930
 ] 

Rajkiran Sura commented on ZOOKEEPER-2230:
--

Hi [~fittey], Apologies that Deepesh couldn't test out your patch. I am a 
colleague of Deepesh and now working on upgrading ZooKeeper to 3.5.5 branch. As 
you said, this bug exists in v3.5.5 too. Could you please provide me your patch 
with v3.5.5 branch and I would be happy to test it out in our environment. 
Thanks!

> Connections fo ZooKeeper server becomes slow over time with native GSSAPI
> -
>
> Key: ZOOKEEPER-2230
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2230
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.5.0
> Environment: OS: RHEL6
> Java: 1.8.0_40
> Configuration:
> java.env:
> {noformat}
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Xmx5120m"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS 
> -Djava.security.auth.login.config=/local/apps/zookeeper-test1/conf/jaas-server.conf"
> SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dsun.security.jgss.native=true"
> {noformat}
> jaas-server.conf:
> {noformat}
> Server {
> com.sun.security.auth.module.Krb5LoginModule required
> useKeyTab=true
> isInitiator=false
> principal="zookeeper/@";
> };
> {noformat}
> Process environment:
> {noformat}
> KRB5_KTNAME=/local/apps/zookeeper-test1/conf/keytab
> ZOO_LOG_DIR=/local/apps/zookeeper-test1/log
> ZOOCFGDIR=/local/apps/zookeeper-test1/conf
> {noformat}
>Reporter: Deepesh Reja
>Assignee: Enis Soztutar
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 3.4.6, 3.4.7, 3.4.8, 3.5.2
>
> Attachments: ZOOKEEPER-2230.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ZooKeeper server becomes slow over time when native GSSAPI is used. The 
> connection to the server starts taking upto 10 seconds.
> This is happening with ZooKeeper-3.4.6 and is fairly reproducible.
> Debug logs:
> {noformat}
> 2015-07-02 00:58:49,318 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /:47942
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@78] - 
> serviceHostname is ''
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@79] - 
> servicePrincipalName is 'zookeeper'
> 2015-07-02 00:58:49,318 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@80] - SASL 
> mechanism(mech) is 'GSSAPI'
> 2015-07-02 00:58:49,324 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperSaslServer@106] - Added 
> private credential to subject: [GSSCredential: 
> zookeeper@ 1.2.840.113554.1.2.2 Accept [class 
> sun.security.jgss.wrapper.GSSCredElement]]
> 2015-07-02 00:58:59,441 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@810] - Session 
> establishment request from client /:47942 client's lastZxid is 0x0
> 2015-07-02 00:58:59,441 [myid:] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@868] - Client 
> attempting to establish new session at /:47942
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x14e486028785c81 type:createSession cxid:0x0 zxid:0x110e79 
> txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - DEBUG 
> [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x14e486028785c81 
> type:createSession cxid:0x0 zxid:0x110e79 txntype:-10 reqpath:n/a
> 2015-07-02 00:58:59,448 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - 
> Established session 0x14e486028785c81 with negotiated timeout 1 for 
> client /:47942
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,452 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 706
> 2015-07-02 00:58:59,460 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 161
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@949] - Responding 
> to client SASL token.
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@953] - Size of 
> client SASL token: 0
> 2015-07-02 00:58:59,462 [myid:] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:42405:ZooKeeperServer@984] - Size of 
> server SASL response: 32
> 2015-07-02 00:58:59,463 [myid:] - DEBUG 
>