from:"Szalay\-Bekő Máté"

Re: [ANNOUNCE] Apache ZooKeeper 3.9.2

2024-03-12 Thread Szalay-Bekő Máté

Thank you Damien for driving these releases!!

Cheers,
Máté

On Tue, Mar 12, 2024 at 12:14 PM Damien Diederen 
wrote:

>
> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> 3.9.2
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads, visit:
> https://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.9.2 Release Notes are at:
> https://zookeeper.apache.org/doc/r3.9.2/releasenotes.html
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The ZooKeeper Team
>

Re: [ANNOUNCE] Apache ZooKeeper 3.9.0

2023-08-05 Thread Szalay-Bekő Máté

Great job indeed, thanks to everyone!! :)
Máté

On Fri, Aug 4, 2023 at 6:24 PM Li Wang  wrote:

> Congrats! Thanks Andor and Enrico for leading this and all the contributors
> that made this possible.
>
> Best,
>
> Li
>
> On Fri, Aug 4, 2023 at 7:28 AM Enrico Olivelli 
> wrote:
>
> > Congratulations !
> >
> > This is great step forward
> >
> > I hope that people will try out soon the Backup/Restore feature and
> > that they provide feedback
> >
> > Enrico
> >
> > Il giorno ven 4 ago 2023 alle ore 13:24 Andor Molnar
> >  ha scritto:
> > >
> > > The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> > > 3.9.0
> > >
> > > ZooKeeper is a high-performance coordination service for distributed
> > > applications. It exposes common services - such as naming,
> > > configuration management, synchronization, and group services - in a
> > > simple interface so you don't have to write them from scratch. You can
> > > use it off-the-shelf to implement consensus, group management, leader
> > > election, and presence protocols. And you can build on it for your
> > > own, specific needs.
> > >
> > > For ZooKeeper release details and downloads, visit:
> > > https://zookeeper.apache.org/releases.html
> > >
> > > ZooKeeper 3.9.0 Release Notes are at:
> > > https://zookeeper.apache.org/doc/r3.9.0/releasenotes.html
> > >
> > > We would like to thank the contributors that made the release possible.
> > >
> > > Regards,
> > >
> > > The ZooKeeper Team
> > >
> > >
> >
>

[ANNOUNCE] Apache ZooKeeper 3.8.2

2023-07-18 Thread Szalay-Bekő Máté

The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.8.2

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

Release 3.8.2 is a bugfix release, solving 12 issues, including CVE fixes
and
additional test, security and other improvements.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.8.2 Release Notes are at:
https://zookeeper.apache.org/doc/r3.8.2/releasenotes.html

We would like to thank the contributors who made the release possible.

Regards,
The ZooKeeper Team

Re: Any change from 3.6.3 -> 3.6.4 would cause hostname unresolved issue?

2023-06-14 Thread Szalay-Bekő Máté

Interesting...

I am not familiar with strimzi.io.
Quickly checking the release notes, I don't see anything suspicious:
https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html
Also, QuorumCnxManager was not changed for 2+ years on branch 3.6.

Are you use the same java version and zookeeper config for 3.6.3 and 3.6.4?
Can you share the zookeeper config?

Also: zookeeper 3.6 is deprecated since december 2022. Can you reproduce
the issue on newer ZooKeeper versions?

best regards,
Máté

On Tue, Jun 13, 2023 at 10:16 AM Luke Chen  wrote:

> Hi all,
>
> We're running zookeeper under minikube using strimzi  >.
> The zookeeper works well while running with ZK v3.6.3. But when we upgraded
> to v3.6.4, we encountered hostname unresolved issue. I'm wondering if this
> is a regression that some changes between v3.6.3 and v3.6.4 cause this
> issue?
>
> Logs:
> 
> 2023-06-12 12:25:38,149 INFO binding to port /127.0.0.1:12181
> (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
> 2023-06-12 12:25:38,194 INFO bound to port 12181
> (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
> 2023-06-12 12:25:38,194 INFO binding to port 0.0.0.0/0.0.0.0:2181
> (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
> 2023-06-12 12:25:38,195 INFO bound to port 2181
> (org.apache.zookeeper.server.NettyServerCnxnFactory) [main]
> 2023-06-12 12:25:38,195 INFO Using 4000ms as the quorum cnxn socket timeout
> (org.apache.zookeeper.server.quorum.QuorumPeer) [main]
> 2023-06-12 12:25:38,199 INFO Election port bind maximum retries is infinite
> (org.apache.zookeeper.server.quorum.QuorumCnxManager) [main]
> 2023-06-12 12:25:38,201 INFO Creating TLS-only quorum server socket
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>
> [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:3888]
> 2023-06-12 12:25:38,202 INFO ZooKeeper audit is disabled.
> (org.apache.zookeeper.audit.ZKAuditProvider) [main]
> 2023-06-12 12:25:38,202 ERROR Exception while listening
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>
> [ListenerHandler-my-cluster-zookeeper-2.my-cluster-zookeeper-nodes.default.svc/:3888]
> java.net.SocketException: Unresolved address
> at java.base/java.net.ServerSocket.bind(ServerSocket.java:380)
> at java.base/java.net.ServerSocket.bind(ServerSocket.java:342)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1135)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> at
>
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833)
>
> 
>
> Any thoughts or suggestions are welcomed.
>
> Thank you.
> Luke
>

[ANNOUNCE] Apache ZooKeeper 3.6.4

2022-12-31 Thread Szalay-Bekő Máté

The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.6.4

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

3.6.4 is the last bugfix release for branch 3.6, as 3.6 is EoL since 30th
December, 2022.
It fixes 42 issues, including CVE fixes, log4j1 removal (using reload4j
from now)
and various other bug fixes (e.g. snapshotting, SASL and C client related
fixes).

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.6.4 Release Notes are at:
https://zookeeper.apache.org/doc/r3.6.4/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,
The ZooKeeper Team

[ANNOUNCE] Year 2023 :)

2022-12-31 Thread Szalay-Bekő Máté

The Apache ZooKeeper team is proud to announce Year 2023

You may have noticed that the year 2022 also gets EoL very soon. The new
2023 version was released successfully and - depending on time zones -
people started to use it already.

Based on the feedbacks the upgrades seemed to be smooth so far, at least no
incompatibilities found. Sadly, we were not able to test this version
before the release. Most likely it will introduce some new issues but
hopefully also will solve some of the old ones. Let's enjoy it anyway...

Happy new year! ;)

Re: 2 out of 5 Zookeepers down with "Exception when following the leader"

2022-10-20 Thread Szalay-Bekő Máté

Hello!

once I saw something similar, when the problem was that the clocks were not
in sync on the ZooKeeper VMs. And when the ntp server updated the clock, it
screwed up the kerberos token renewal threads in the current leader,
causing SASL related unhandled exceptions during authentication on leader
election ports which basically made it impossible for Peers to connect to
the Leader for some time.

Although I'm not sure if you use quorum authentication. (I see you don't
really use ACLs, at least "skipACL=1" suggests that in your zoo.cfg, so
maybe you have a different problem)

Anyway, maybe you can check the logs from the leader during this time,
maybe you see something interesting there.

Also you might look for error messages that would indicate that some
threads died in the leader / in other servers.

E.g.
"ERROR org.apache.zookeeper.server.NIOServerCnxnFactory: Thread
Thread[QuorumConnectionThread-[myid=3]-43,5,main] died"

In the leader, there are separate threads responsible to handle leader
election of other quorum messages. For each peer, there is a separate
thread. And it is possible some times that these threads die, and the
Leader is still leading but can not accept new quorum members. Several bugs
were fixed regarding this in the past. (but AFAIK all of them are present
in 3.5.8)

Kind regards,
Máté

On Thu, Oct 20, 2022 at 3:20 PM Onno Zweers  wrote:

> Hi all,
>
> I’m new to this list so please forgive me if I’m asking a question that
> has been discussed before or if I’m asking it in the wrong place.
>
> I sometimes see 1 or 2 out of our 5 zookeepers going down. If it’s two
> going down, it appears to happen at almost the same millisecond. They stay
> down for ~16 minutes. This scares the hell out of me because it means that
> during this time we don’t have any redundancy left.
>
> Our 5 zookeepers are at version 3.5.8, running in libvirt VMs with Centos
> 8 and OpenJDK 1.8.0_322. They are all on a separate VM host. I checked for
> network issues (with ping, iperf3) but found nothing. It’s not always the
> same zookeeper VM(s) that suffers from this problem. It does not happen at
> a regular time so there does not seem to be a cron job affecting this
> behavior.
>
> I have some logging from the last time this happened; I’ll add it below.
>
> I know we’re running an old version of Zookeeper. I have looked through
> the release notes but I couldn’t find a fix for this.
>
> I’d welcome any ideas. How could we prevent this? Based on the
> SocketTimeoutException, I’d think there might be a network hiccup; but why
> is it so difficult for Zookeeper to recover from this? Is it a bug in
> Zookeeper? Should I report it on https://issues.apache.org?
>
> Kind regards,
> Onno
>
> [root@feszoo4 /var/log]# grep 'Oct 11 16:01:4.*zookeeper'
> messages-20221016
> Oct 11 16:01:43 feszoo4 zookeeper[1199]: 2022-10-11 16:01:43,837 - WARN
> [QuorumPeer[myid=4](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@96]
> - Exception when following the leader
> Oct 11 16:01:43 feszoo4 zookeeper[1199]: java.net.SocketTimeoutException:
> Read timed out
> ...
>
>
> [root@feszoo3 /var/log]# grep 'Oct 11 16:01:4.*zookeeper'
> messages-20221016
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: 2022-10-11 16:01:43,838 - WARN
> [QuorumPeer[myid=3](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@96]
> - Exception when following the leader
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: java.net.SocketTimeoutException:
> Read timed out
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.net.SocketInputStream.socketRead0(Native Method)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.net.SocketInputStream.read(SocketInputStream.java:171)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.net.SocketInputStream.read(SocketInputStream.java:141)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> java.io.DataInputStream.readInt(DataInputStream.java:387)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
> Oct 11 16:01:43 feszoo3 zookeeper[1187]: #011at
>

Re: Backup and restore Solr 8.11.2 collections and configsets in Zookeeper version: 3.7.0

2022-09-22 Thread Szalay-Bekő Máté

Hello Kaushal!

One of my Solr colleagues just mentioned a possible solution. What about
using the solr admin script? With recursive option, like "solr zk -r"

e.g.

bin/solr zk cp -r file:/apache/confgs/whatever/conf zk:/configs/myconf -z
111.222.333.444:2181

https://solr.apache.org/guide/8_11/solr-control-script-reference.html#copy-between-local-files-and-zookeeper-znodes

Isn't this what you are looking for?

Best regards,
Máté

On Fri, Sep 16, 2022 at 8:23 PM Shawn Heisey 
wrote:

> On 9/16/22 09:37, Szalay-Bekő Máté wrote:
> > But actually much better would be to do the backup and restore on Solr
> > level.
>
> Solr doesn't currently have this capability.  We do have functionality
> that can download index configs from ZK to the filesystem, but not all
> the cluster contents in ZK.
>
> If the ZK is dedicated to Solr, I believe you can copy the entire
> "version-2" directory from the ZK datadir and install it in new ZK nodes
> while they are down, then start them up.
>
> Thanks,
> Shawn
>
>

Re: Zookeeper leader election for client read and write requests

2022-09-22 Thread Szalay-Bekő Máté

Hello Kaushal,

>  1. What is the algorithm used to elect the new leader between the
remaining 2 followers?

There is a very high-level description of our internal ZooKeeper leader
election algorithm here:
https://zookeeper.apache.org/doc/current/zookeeperInternals.html#sc_leaderElection
I don't know if we have more detailed documentation. If you are interested
in the code, best to start here:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java
Also we have many unit tests around leader election that can help to
understand the behaviour.

> 2. During the leader elections process in place, does the client see
a 503 service unavailable for all read or write requests?

 "503 service unavailable" is an HTTP error code, and on the ZooKeeper
Client interface we don't use HTTP but we use a (jute based) binary
protocol. In ZooKeeper, we have client sessions which can be kept alive for
some time even if they can not communicate with the server. E.g. if you set
client session timeout to 30 sec and there is a leader election in
ZooKeeper server that takes e.g. 10 seconds, then (as far as I remember)
the ZooKeeper client library should keep the session open so this should
not be visible for the applications using ZooKeeper. Of course no change
can be submitted (or no new session can be created) while the quorum has no
active leader, so I assume these operations will be blocked until the
internal leader election finishes in ZooKeeper. So one can expect longer
response time temporarily in case of a leader election.

>3. In an ensemble of 3 nodes with 1 leader and 2 followers. Is there a
way to see which node is serving read operations and which node is serving
write operations?

In ZooKeeper, the current leader is responsible to do all the modification
on the data, and all the changes made by the leader are synchronized to all
followers. The four-letter-word diagnostic interface (
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_4lw) or the
HTTP admin API (
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_adminserver)
can be used to find the current leader in the cluster. However, in
ZooKeeper the clients can be connected to any ZooKeeper Server in the
quorum (unless leaderServes config is explicitly disabled), and normally
all servers will accept both read and write operations. A client session is
handled by a server and if we send a write request, then this server will
make sure to play it through the current leader before sending back the
answer to the client. The client doesn't need to know who is the current
leader, it can communicate to any server. Usually we list all the ZooKeeper
servers when we initiate a new client session, so the client library can
fail-over and loadbalance.

In general, you might find useful to read our documentation:
https://zookeeper.apache.org/doc/current/zookeeperOver.html

Kind regards,
Máté

On Sat, Sep 17, 2022 at 6:27 PM Steph van Schalkwyk 
wrote:

> Just google leader election site:zookeeper.apache.org
>
>
> On Fri, Sep 16, 2022 at 7:39 PM Kaushal Shriyan 
> wrote:
>
> > Hi,
> >
> > I am running Zookeeper version: 3.7.0 ( 3 nodes -> 1 Leader and 2
> > Followers) on CentOS Linux release 7.9.2009 (Core). In an ensemble of 3
> > nodes with 1 leader and 2 followers, if the leader goes down then two
> > servers can elect a leader among themselves. I have the below questions.
> >
> >1. What is the algorithm used to elect the new leader between the
> >remaining 2 followers?
> >2. During the leader elections process in place, does the client see a
> >503 service unavailable for all read or write requests?
> >3. In an ensemble of 3 nodes with 1 leader and 2 followers. Is there a
> >way to see which node is serving read operations and which node is
> > serving
> >write operations?
> >
> > Please guide me. Any help will be highly appreciable. Thanks in advance.
> >
> > Best Regards,
> >
> > Kaushal
> >
>

Re: Backup and restore Solr 8.11.2 collections and configsets in Zookeeper version: 3.7.0

2022-09-16 Thread Szalay-Bekő Máté

Hello Kaushal!

I think best would be to ask this in an Apache Solr mailing list.
Here we mostly know about ZooKeeper :)

I know that Solr is storing many configurations (including full config
files) in ZooKeeper. To backup everything, you would need to export all the
content from ZooKeeper under e.g. the "/solr" ZNode. We don't have such a
recursive backup / restore solution in ZooKeeper. We mainly support the
backup and restore of the whole ZooKeeper data tree. One can create some
recursive script to achieve this (e.g. using the official ZooKeeper
clients, or other clients listed here -
https://cwiki.apache.org/confluence/display/zookeeper/zkclientbindings . I
typically use kazoo for custom scripts like this).

But actually much better would be to do the backup and restore on Solr
level. I'm not exactly sure about the best practices, but here are some
links to the official Solr documentation that might be helpful:
-
https://solr.apache.org/guide/8_11/using-zookeeper-to-manage-configuration-files.html
- https://solr.apache.org/guide/8_11/collections-api.html
- https://solr.apache.org/guide/8_11/collection-management.html#backup

Best regards,
Máté

On Fri, Sep 16, 2022 at 5:21 PM Kaushal Shriyan 
wrote:

> Hi,
>
> I am running Solr 8.11.2 (2 nodes) and Zookeeper version: 3.7.0 ( 3 nodes
> -> 1 Leader and 2 Followers) on Linux release 7.9.2009 (Core).
>
> Is there a way to back up the collections and configsets in Zookeeper
> version: 3.7.0 ( 3 nodes -> 1 Leader and 2 Followers) on Linux release
> 7.9.2009 (Core) as per the details below?
>
> #./zkCli.sh
> /bin/java
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
>
> WATCHER::
>
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0] ls /
> solrzookeeper
> [zk: localhost:2181(CONNECTED) 0] ls /solr/co
> *collections*   *configs*
> [zk: localhost:2181(CONNECTED) 0]
>
> Thanks in advance. Please guide and I look forward to hearing from you.
>
> Best Regards,
>
> Kaushal
>

Re: Read performance of 3.4.6 vs 3.8.0 according to zookeeper-benchmark

2022-08-29 Thread Szalay-Bekő Máté

Interesting, thanks for sharing the results!

It will be hard to figure out what changed since the 3.4.6 release. Many
new features added. Still, it would be good to find out which version /
commit was responsible for the most performance degradation.

You wrote before:
> A while back I performed similar comparisons of 3.4.6 vs 3.6.x and I got
> slow results in 3.6.x initially, but disabling digest.enabled fixed it and
> the two versions were then comparable. In 3.8.0 I am seeing poor results
> with or without digest enabled.

Then later:
> 27,052: 3.4.6
> 15,993:3.6.3
> 15,943:3.7.1
> 16,805: 3.8.0, digest.enabled=true:
> 16,682: 3.8.0: digest.enabled=false
> *16,370: 3.8.0 NullMetricsProvider*

Was the "15,993:3.6.3" measured with or without digest? It sounds like
earlier you were able to get good results with 3.6.x
(using digest.enabled=false). Is this still reproducible with 3.6.3? If
yes, then still I would look around the digest feature for potential root
cause.

On the other hand, the digest feature should not really affect the read
path, as far as I understood. But I also don't know how the benchmarking
tool executes the operations (maybe there are writes running parallel with
reads?).

Máté


On Mon, Aug 29, 2022 at 7:00 AM Will Now  wrote:

> In case it is informative, here is the output from running zookeeper 3.8.0
> in my latest test run with NullMetricsProvider.
>
> ~/servers/zookeeper/apache-zookeeper-3.8.0-bin/bin$ ./zkServer.sh
> start-foreground
> /usr/bin/java
> ZooKeeper JMX enabled by default
> Using config:
> /home/will/servers/zookeeper/apache-zookeeper-3.8.0-bin/bin/../conf/zoo.cfg
> 2022-08-28 21:36:13,846 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@177]
> - Reading configuration from:
> /home/will/servers/zookeeper/apache-zookeeper-3.8.0-bin/bin/../conf/zoo.cfg
> 2022-08-28 21:36:13,850 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@440]
> - clientPortAddress is 0.0.0.0:2181
> 2022-08-28 21:36:13,851 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@444]
> - secureClientPort is not set
> 2022-08-28 21:36:13,851 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@460]
> - observerMasterPort is not set
> 2022-08-28 21:36:13,851 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@477]
> - metricsProvider.className is
> org.apache.zookeeper.metrics.impl.NullMetricsProvider
> 2022-08-28 21:36:13,852 [myid:] - INFO
>  [main:o.a.z.s.DatadirCleanupManager@78] - autopurge.snapRetainCount set
> to
> 3
> 2022-08-28 21:36:13,852 [myid:] - INFO
>  [main:o.a.z.s.DatadirCleanupManager@79] - autopurge.purgeInterval set to
> 0
> 2022-08-28 21:36:13,852 [myid:] - INFO
>  [main:o.a.z.s.DatadirCleanupManager@101] - Purge task is not scheduled.
> 2022-08-28 21:36:13,852 [myid:] - WARN  [main:o.a.z.s.q.QuorumPeerMain@139
> ]
> - Either no config or no quorum defined in config, running in standalone
> mode
> 2022-08-28 21:36:13,853 [myid:] - INFO  [main:o.a.z.j.ManagedUtil@46] -
> Log4j 1.2 jmx support not found; jmx disabled.
> 2022-08-28 21:36:13,853 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@177]
> - Reading configuration from:
> /home/will/servers/zookeeper/apache-zookeeper-3.8.0-bin/bin/../conf/zoo.cfg
> 2022-08-28 21:36:13,854 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@440]
> - clientPortAddress is 0.0.0.0:2181
> 2022-08-28 21:36:13,854 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@444]
> - secureClientPort is not set
> 2022-08-28 21:36:13,854 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@460]
> - observerMasterPort is not set
> 2022-08-28 21:36:13,854 [myid:] - INFO
> [main:o.a.z.s.q.QuorumPeerConfig@477]
> - metricsProvider.className is
> org.apache.zookeeper.metrics.impl.NullMetricsProvider
> 2022-08-28 21:36:13,854 [myid:] - INFO
>  [main:o.a.z.s.ZooKeeperServerMain@123] - Starting server
> 2022-08-28 21:36:13,861 [myid:] - INFO  [main:o.a.z.s.ServerMetrics@64] -
> ServerMetrics initialized with provider
> org.apache.zookeeper.metrics.impl.NullMetricsProvider@4d5d943d
> 2022-08-28 21:36:13,862 [myid:] - INFO
>  [main:o.a.z.s.a.DigestAuthenticationProvider@47] - ACL digest algorithm
> is: SHA1
> 2022-08-28 21:36:13,862 [myid:] - INFO
>  [main:o.a.z.s.a.DigestAuthenticationProvider@61] -
> zookeeper.DigestAuthenticationProvider.enabled = true
> 2022-08-28 21:36:13,863 [myid:] - INFO  [main:o.a.z.s.p.FileTxnSnapLog@124
> ]
> - zookeeper.snapshot.trust.empty : false
> 2022-08-28 21:36:13,869 [myid:] - INFO  [main:o.a.z.ZookeeperBanner@42] -
> 2022-08-28 21:36:13,869 [myid:] - INFO  [main:o.a.z.ZookeeperBanner@42] -
> __  _
> 2022-08-28 21:36:13,869 [myid:] - INFO  [main:o.a.z.ZookeeperBanner@42] -
>  |___  / | |
> 2022-08-28 21:36:13,869 [myid:] - INFO  [main:o.a.z.ZookeeperBanner@42] -
>   / /___ ___   | | __   ______   _ __ ___   _ __
> 2022-08-28 21:36:13,869 [myid:] - INFO  [main:o.a.z.ZookeeperBanner@42] -
>  / // _ \   / _ \  | |/ /  / _ \  / _ \ | '_ \   / _ \ | '__|
> 2022-08-28 21:36:13,869 [myid:] - INFO

Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-24 Thread Szalay-Bekő Máté

Hello Ram,

sorry, I don't really understand the question. The zxid is a 64 bit long
number. The upper 32 bits are coding an election epoch number (a logical
time / counter for leader elections), while the bottom 32 bits are counting
/ providing an auto incremented id for all the changes made (committed) in
ZooKeeper. As far as I understood, the followers are sending proposals to
the leader, and each accepted (committed) proposal will result in an
increase in the zxid. The "current" / "latest" zxid is the same in the
whole cluster (of course followers can lag behind a little, but not much in
theory. if they are in-sync and part of the quorum).

My understanding is that what you want to catch, is the event when the
lower 32 bits of the zxid is approaching 0x . As when the last 32
bits of the zxid is reaching 0x, then a new leader election will be
triggered automatically and ZooKeeper won't be able to serve for a short
period of time. And I guess you want to control this event and maybe
restart the leader manually in a time what is suiting you better?

But maybe I misunderstood your question.

Máté

On Tue, Aug 23, 2022 at 11:00 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Máté,
>
> Thanks for quick reply, yes i did see that srvr command can give the
> current zxid, I also see a metric in mntr "proposal_count" which gives
> total proposals and when we hit the zxid limit that is matching with the
> proposal_count  2^32=*4,294,967,296)*metric. So i am trying to understand
> how this zxid will get incitement ? I don't see zxid in logs for normal
> events other than leader elections time.
>
> Ram
>
>
>
> On Tue, Aug 23, 2022 at 10:10 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
> > Hello!
> >
> > I think the "srvr" 4-letter-word diagnostic command should print you the
> > current zxid. Also the similar command works on the Admin Rest API (if it
> > is enabled).
> >
> > See:
> >
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands
> >
> > An example:
> >
> >
> > echo srvr | nc localhost 2181
> >
> > Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
> > built on 08/08/2022 11:04 GMT
> > Latency min/avg/max: 0/0/1808
> > Received: 9599434
> > Sent: 9673689
> > Connections: 41
> > Outstanding: 0
> > Zxid: 0x2000afcbf <- this line
> > Mode: leader
> > Node count: 1384
> > Proposal sizes last/min/max: 32/32/4226
> >
> >
> >
> >
> > Also the zxid is added to the name of the snapshots / transaction log
> > files, which are flushed to the file system. Like:  log.  or
> > snapshot.
> >
> > e.g.:
> >
> > ls -la -R /var/lib/zookeeper/version-2/
> >
> > /var/lib/zookeeper/version-2/:
> > total 57328
> > drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
> > drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
> > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
> > -rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
> > -rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
> > -rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
> > -rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
> > -rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
> > -rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
> > -rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56
> >
> >
> >
> > Best regards,
> > Máté
> >
> > On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > We recently had a leader election due to "*zxid lower 32 bits have
> rolled
> > > over, forcing re-election*". This is the first time we are seeing this
> > and
> > > trying to understand how to find if the ensemble is reaching that
> limit.
> > > Are there any metrics available in zk to track this? How can I estimate
> > > when my zk cluster will reach this limit?
> > >
> > > Thanks,
> > > Ram
> > >
> >
>

Re: Rolling upgrade from 3.4.6 to 3.8.0

2022-08-24 Thread Szalay-Bekő Máté

> I see now that the latest 3.5.x version is 3.5.10.  Is there some reason
> you quoted 3.5.9 in your original response to me, or should I use the
> latest 3.5.x, aka 3.5.10?

sorry, my mistake... I forgot that new release. I would advise to use the
latest 3.5.x (3.5.10) version.

Best regards,
Máté

On Tue, Aug 23, 2022 at 11:37 PM Will Now  wrote:

> Thanks for the info. I see in some zk mailing list threads that you've
> given similar advice in the past such as a March 24 2020 response with
> details on the underlying cause for concern.
>
> I see now that the latest 3.5.x version is 3.5.10.  Is there some reason
> you quoted 3.5.9 in your original response to me, or should I use the
> latest 3.5.x, aka 3.5.10?
>
> Thanks!
> You wrote:
> | ... I would recommend the
> | following upgrade path:
> | - 3.4.6 -> 3.4.14 (latest 3.4)
> | - 3.4.14 -> 3.5.9 (latest 3.5)
> | - 3.5.9 -> 3.8.0 (currently the latest 3.8)
>
> On Tue, Aug 23, 2022 at 6:32 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > yeah, I remember something...
> > I think that was about the leader election protocol (between ZooKeeper
> > servers), which changed somewhat around the introduction of dynamic
> > reconfig, and you need a relatively late ZooKeeper 3.4 version to be able
> > to communicate (and form a quorum) together with newer ZooKeeper
> servers. I
> > am not sure which 3.4 version is safe to use, but I would recommend the
> > following upgrade path:
> > - 3.4.6 -> 3.4.14 (latest 3.4)
> > - 3.4.14 -> 3.5.9 (latest 3.5)
> > - 3.5.9 -> 3.8.0 (currently the latest 3.8)
> >
> > I think I tested these above rolling upgrades already, but I would
> > definitely advise you to try everything out on some test cluster (with
> live
> > test traffic) before you would do it in production.
> >
> > I am not sure if 3.4.14 -> 3.8.0 works (I don't remember testing it), but
> > it might work as well.
> >
> > Also some upgrade related info you might find interesting:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
> >
> > Best regards,
> > Máté
> >
> > On Tue, Aug 23, 2022 at 4:09 AM Will Now  wrote:
> >
> > > I'm planning to do a rolling upgrade from 3.4.6 to 3.8.0.  While
> > > researching zookeeper recently I could swear that I stumbled across
> some
> > > documentation with some warnings about 3.4.6.  Specifically it sounded
> > like
> > > you might encounter failures during the upgrade if you jumped straight
> > from
> > > 3.4.6 to 3.8.0 (and other newer versions also perhaps).
> > >
> > > I recall that it was recommended to first upgrade to 3.4.10 and THEN
> jump
> > > to the 3.8.0 version.
> > >
> > > I am now feeling like I dreamt all of this because i cannot locate that
> > > information for the life of me.  Does this ring a bell with anyone?
> > Perhaps
> > > you can send me a link or tell me I'm losing it (or both)? Thanks!
> > >
> >
>

Re: How find if the zxid is reaching the limit (zxid lower 32 bits have rolled over, forcing re-election)

2022-08-23 Thread Szalay-Bekő Máté

Hello!

I think the "srvr" 4-letter-word diagnostic command should print you the
current zxid. Also the similar command works on the Admin Rest API (if it
is enabled).

See:
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_zkCommands

An example:


echo srvr | nc localhost 2181

Zookeeper version: 3.5.5-136-69648f116c849ccd757e97c26d3450022d4b1dae,
built on 08/08/2022 11:04 GMT
Latency min/avg/max: 0/0/1808
Received: 9599434
Sent: 9673689
Connections: 41
Outstanding: 0
Zxid: 0x2000afcbf <- this line
Mode: leader
Node count: 1384
Proposal sizes last/min/max: 32/32/4226




Also the zxid is added to the name of the snapshots / transaction log
files, which are flushed to the file system. Like:  log.  or
snapshot.

e.g.:

ls -la -R /var/lib/zookeeper/version-2/

/var/lib/zookeeper/version-2/:
total 57328
drwxr-xr-x 2 zookeeper zookeeper 4096 Aug 23 10:42 .
drwxr-x--- 3 zookeeper zookeeper 4096 Aug  9 10:41 ..
-rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 acceptedEpoch
-rw-r--r-- 1 zookeeper zookeeper1 Aug 10 17:55 currentEpoch
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 17 10:09 log.20004c9fc
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 19 00:37 log.20005a541
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 20 18:43 log.20006fc19
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 21 21:40 log.200087550
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 06:30 log.200096ed6
-rw-r--r-- 1 zookeeper zookeeper 67108880 Aug 23 17:05 log.2000a9c57
-rw-r--r-- 1 zookeeper zookeeper  1372956 Aug 17 10:09 snapshot.20005a540
-rw-r--r-- 1 zookeeper zookeeper  1370403 Aug 19 00:37 snapshot.20006fc18
-rw-r--r-- 1 zookeeper zookeeper  1369122 Aug 20 18:43 snapshot.20008754f
-rw-r--r-- 1 zookeeper zookeeper  1369034 Aug 21 21:40 snapshot.200096ed4
-rw-r--r-- 1 zookeeper zookeeper  1379613 Aug 23 06:30 snapshot.2000a9c56



Best regards,
Máté

On Tue, Aug 23, 2022 at 6:55 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> We recently had a leader election due to "*zxid lower 32 bits have rolled
> over, forcing re-election*". This is the first time we are seeing this and
> trying to understand how to find if the ensemble is reaching that limit.
> Are there any metrics available in zk to track this? How can I estimate
> when my zk cluster will reach this limit?
>
> Thanks,
> Ram
>

Re: Rolling upgrade from 3.4.6 to 3.8.0

2022-08-23 Thread Szalay-Bekő Máté

yeah, I remember something...
I think that was about the leader election protocol (between ZooKeeper
servers), which changed somewhat around the introduction of dynamic
reconfig, and you need a relatively late ZooKeeper 3.4 version to be able
to communicate (and form a quorum) together with newer ZooKeeper servers. I
am not sure which 3.4 version is safe to use, but I would recommend the
following upgrade path:
- 3.4.6 -> 3.4.14 (latest 3.4)
- 3.4.14 -> 3.5.9 (latest 3.5)
- 3.5.9 -> 3.8.0 (currently the latest 3.8)

I think I tested these above rolling upgrades already, but I would
definitely advise you to try everything out on some test cluster (with live
test traffic) before you would do it in production.

I am not sure if 3.4.14 -> 3.8.0 works (I don't remember testing it), but
it might work as well.

Also some upgrade related info you might find interesting:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

Best regards,
Máté

On Tue, Aug 23, 2022 at 4:09 AM Will Now  wrote:

> I'm planning to do a rolling upgrade from 3.4.6 to 3.8.0.  While
> researching zookeeper recently I could swear that I stumbled across some
> documentation with some warnings about 3.4.6.  Specifically it sounded like
> you might encounter failures during the upgrade if you jumped straight from
> 3.4.6 to 3.8.0 (and other newer versions also perhaps).
>
> I recall that it was recommended to first upgrade to 3.4.10 and THEN jump
> to the 3.8.0 version.
>
> I am now feeling like I dreamt all of this because i cannot locate that
> information for the life of me.  Does this ring a bell with anyone? Perhaps
> you can send me a link or tell me I'm losing it (or both)? Thanks!
>

Re: Can the leader of a Zookeeper be specifically selected at startup?

2022-06-20 Thread Szalay-Bekő Máté

I also don't really know why you would need a single host being "preferred"
as leader. I think the safest (and the best practice) is to make sure all
your ZooKeeper servers are the same in terms of networking / performance /
etc.

Not knowing your goals, maybe the Observer feature is also something you
can take a look into:
https://zookeeper.apache.org/doc/r3.6.3/zookeeperObservers.html

Best regards,
Mate

On Mon, Jun 20, 2022 at 9:57 AM Enrico Olivelli  wrote:

> George,
> really, it should not be a problem which is the leader. it is
> automatically chosen.
> Each node should be ideally as powerful as the other peers.
>
> why do you need this "preferred leader" ?
> I am afraid that you have some flaw in your design
>
> Enrico
>
> Il giorno lun 20 giu 2022 alle ore 05:39 Kezhu Wang 
> ha scritto:
> >
> > Hi,
> >
> > I think this could be achieved with help from `reconfig`[1]:
> > * Configs all nodes with `standaloneEnabled=false`,
> `reconfigEnabled=true`.
> > * Starts node-2 as sole quorum participant.
> > * Now node-2 is the leader. You will see "No server failure will be
> > tolerated. You need at least 3 servers”.
> > * Starts node-1 and node-3 with all quorum.
> > * `zkCli.sh config` shows only node-2 for now.
> > * `zkCli.sh reconfig -add node-1,node-2` will add both node-1, node-3 to
> > quorum.
> > * According to `Leader.tryToCommit`[2], node-2 will be the leader due to
> > old leadership in old quorum and voter in new quorum.
> >
> > node-2 is the leader in whole progress.
> >
> > [1]: https://zookeeper.apache.org/doc/current/zookeeperReconfig.html
> > [2]:
> >
> https://github.com/apache/zookeeper/blob/b4f9aab099880ba8ef08eaff697debe6cdeae057/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L950
> >
> > Best,
> > Kezhu Wang
> >
> > On June 19, 2022 at 23:00:59, Heller, George A III CTR (USA) (
> > george.a.heller2@mail.mil.invalid) wrote:
> >
> > We have 3 Zookeeper nodes and would like node 2 to always be the leader
> > unless node 2 goes down. IF node 2 goes down, then either node 1 or node
> 3
> > would be the leader.
> >
> >
> >
> > Can this be done? If so, how would this be done?
>

Re: Client connection patterns

2022-06-10 Thread Szalay-Bekő Máté

> the person who experienced a failure and loss of a collection evidently
had a "not synced" zk server node that also didn't know it wasn't synced

This indeed should be a configuration issue or a bug. ZK servers when
joining a quorum, sync-ing their states and later only the leader changes
the state. Although I know a few bugs in older ZK versions that can lead to
out-of-sync states in some rare cases.

In ZooKeeper, each committed state change made by the leader is marked by a
'zxid', an atomically / monotone increasing 'logical timestamp' value. It
shouldn't happen to have different changes marked by the same zxid. Also if
a client got a response that some change was successful, then that change
has already persisted and even in the case of a server failure, it should
be restored. But the theoretical configuration problem you described
(having multiple ZK quorums running and clients got provided servers from
multiple quorums) can lead to all sorts of problems. The states will
diverge and even the session ids can be duplicated.

However, you mentioned that ZooKeeper is used differently in multiple
places in Solr. If Solr uses multiple zk client sessions in a single jvm,
then it is possible that these zk sessions are handled by different zk
servers and server A is a bit behind server B (last known zxid is smaller
on zk server A than on zk server B). The state observed by the client in
session A will eventually catch up (within a given time bound) and the
changes in  session A will exactly follow the changes observed in session
B. But ZK doesn't guarantee that different client sessions (either running
on different hosts or in the same host / jvm) will always see the same
state. For the same session (if it moves to a different ZK server) this is
guaranteed though (the client  communicates to the server the last zxid it
has seen and the server knows if it is not up-to-date and can handle the
situation).
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html#Guarantees

When one designs any distributed synchronization using ZooKeeper, then it
is important to calculate with these limitations. Relying on Curator can
help here a lot, as it provides a higher level of abstraction.

> Based on your description A load balancer would still potentially cause
interference if (for example) it didn't allow long running connections
Yes, definitely. ZooKeeper client connections will be frequently broken if
the load balancer doesn't support sticky tcp sessions. And (re-)connection
can be an expensive operation, especially if e.g. kerberos is used.

Best regards,
Mate

On Fri, Jun 10, 2022, 7:53 PM Gus Heck  wrote:

> Thanks for the response. This is helpful, and jibes with what I am seeing
> in the code. WRT solr, currently there are a couple isolated places where
> curator is used, and there is a desire to move to using it extensively but
> that has not happened yet <
> https://issues.apache.org/jira/browse/SOLR-16116>.
> (PR <https://github.com/apache/solr/pull/760>)
>
> To date, Solr manages its own clients directly. Exactly what it does varies
> a bit depending on who wrote the code and when, and I think part of the
> impetus for Curator is to achieve consistency.
>
> In the ticket I linked, the person who experienced a failure and loss of a
> collection evidently had a "not synced" zk server node that also didn't
> know it wasn't synced. Based on what you say above I guess this means that
> node wasn't configured properly and maybe didn't have a list of the other
> servers? I'm assuming nodes start up as not synced... Obviously if true,
> that's a serious user error in configuration, but conceivable if deployment
> infrastructure is meant to generate the config, and succeeds incorrectly
> (imagining something doing something conceptually like $OLDSERVERS + $NEW
> where $OLDSERVERS came out blank).
>
> IIUC you rely on the target server *knowing* that it is out of sync. Adding
> a server that has a false sense of confidence to the client's connection
> string doesn't (yet) seem any different than adding it to the load balancer
> here. In either case the client might select the "rogue" server that thinks
> it's synced but isn't with identical results. Based on your description A
> load balancer would still potentially cause interference if (for example)
> it didn't allow long running connections, but this would just be spurious
> disconnect/reconnect cycles and inconsistent load on individual zk servers,
> suboptimal but non-catastrophic unless near machine capacity already.
>
> -Gus
>
>
> On Fri, Jun 10, 2022 at 12:35 PM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
> > Hello Gus,
> >
> > I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
> > load balancing and also they won't connect to any '

Re: Client connection patterns

2022-06-10 Thread Szalay-Bekő Máté

Hello Gus,

I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
load balancing and also they won't connect to any 'out-of-sync' servers.
The way it works normally:

- You have ZK servers A, B and C. You list all these servers in all your
ZooKeeper client configs. And in all server configs.
- ZK servers form a quorum, when the majority of the servers are joined.
There is always one quorum member selected as leader (this is the only one
that can change the state stored in ZK and it will only commit a change if
the majority of the servers approved it).
- each client is connected to a single ZK server at a time.
- If any ZK server goes out of sync (e.g. losing the connection to the
leader, etc) then it will stop serving requests. So even if your client
would connect to such a server, the client will lose the connection
immediately if the server left the quorum and no client will be able to
connect to such an "out of sync" server.
- The ZK client first connects to a random server from the server list. If
the connection fails (server is unreachable or not serving client request),
the client will move to the next server in the list in a round-robin
fashion, until it finds a working / in-sync server.
- All ZK clients talk with the server in "sessions". Each client
session has an ID, unique in the cluster. If the client loses the
connection to a server, it will automatically try to connect to a different
server using ('renewing') the same session id. "A client will see the same
view of the service regardless of the server that it connects to. i.e., a
client will never see an older view of the system even if the client fails
over to a different server with the same session." This is (among other
things) guaranteed by ZooKeeper.
- Of course, sessions can terminate. There is a pre-configured and
negotiated session timeout. The session will be deleted from the ZK cluster
if the connection between any server and the given client breaks for more
than a pre-configured time (e.g. no heartbeat for 30 seconds). After this
time, the session can not be renewed. In this case the application needs to
decide what to do. It can start a new session, but then it is possible that
e.g. it will miss some watched events and also it will lose its ephemeral
ZNodes. (e.g. I know HBase servers aborts themselves when ZK
session timeout happens, as they can not guarantee consistency anymore) Now
as far as I remember, Solr is using Curator to handle ZooKeeper
connections. I'm not entirely sure how Solr is using ZooKeeper through
Curator. Maybe Solr reconnects automatically with a new session if the old
one terminates. Maybe Solr handles this case in a different way.

see our overview docs:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html

By default the list of the servers is a static config. If you want to add a
new server (or remove an old one), then you need to rolling-restart all the
ZK servers and also restart the clients with the new config. The dynamic
reconfig feature (if enabled) allows you to do this in a more clever way,
storing the list of the ZK servers inside a system znode, which can be
changed dynamically:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperReconfig.html (this is
available since ZooKeeper 3.5)

Best regards,
Mate


On Thu, Jun 9, 2022 at 10:37 PM Gus Heck  wrote:

> Hi ZK Folks,
>
> Some prior information including discussion on SOLR-13396
> <
> https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748
> >
> had
> led me to believe that the zookeeper client established connections to all
> members of the cluster. This then seemed to be the logic for saying that
> having a load balancing in front of zookeeper was dangerous, allowing the
> possibility that the client might decide to talk to a zk that had not
> synced up. In the solr case this could lead to data loss. (see discussion
> in the above ticket).
>
> However, I've now been reading code pursuing an issue for a client and
> unless the multiple connections are hidden deep inside the handling of
> channels in the ClientCnxnSocketNIO class (or it's close relatives) it
> looks a lot to me like only one actual connection is held at one time by an
> instance of ZooKeeper.java.
>
> If that's true, then while the ZooKeeper codebase certainly has logic to
> reconnect and to balance across the cluster etc, it's becoming murky to me
> how listing all zk servers directly vs through a load balancer would be
> protection against connecting to an as-yet unsynced zookeeper if it existed
> in the configured server list.
>
> Does such a protection exist? or is it the user's responsibility not to add
> the server to the list (or load balancer) until it's clear that it has
> successfully joined the cluster and synced its data?
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>

[ANNOUNCE] Apache ZooKeeper 3.5.10

2022-06-05 Thread Szalay-Bekő Máté

The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.5.10

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

3.5.10 is the last bugfix release for branch 3.5, as 3.5 is EoL since 1st
June, 2022.
It fixes 44 issues, including CVE fixes, log4j1 removal (using reload4j
from now)
and various other bug fixes (thread leaks, data corruption, snapshotting
and SASL related fixes).

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.5.10 Release Notes are at:
https://zookeeper.apache.org/doc/r3.5.10/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,
The ZooKeeper Team

Re: [ANNOUNCE] Apache ZooKeeper 3.7.1

2022-05-13 Thread Szalay-Bekő Máté

hurray! :)
thank you for coordinating this release!!

On Thu, May 12, 2022 at 6:12 AM Mohammad Arshad  wrote:

> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> 3.7.1
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads, visit:
> https://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.7.1 Release Notes are at:
> https://zookeeper.apache.org/doc/r3.7.1/releasenotes.html
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The ZooKeeper Team
>

Re: Need to restart after editing the SSL keystore or truststore?

2022-03-28 Thread Szalay-Bekő Máté

Hi Sam,

I never tested this, but I know about a feature already present since 3.5.5
/  3.6.0 about refreshing the keystore file content automatically. See:
https://issues.apache.org/jira/browse/ZOOKEEPER-3174,
https://github.com/apache/zookeeper/pull/680

This needs to be enabled by the "sslQuorumReloadCertFiles". I'm not exactly
sure if this also affects the SSL encryption on the server-client
communication. (also: in my case at least I usually use kerberos for
authentication so I avoid using client authentication with SSL by
configuring ssl.clientAuth=none, so maybe it would be less important for me
to reload the truststore for the client SSL)

Regards,
Mate

On Fri, Mar 25, 2022 at 7:40 PM Sam Lee  wrote:

> In my zoo.cfg file, I have enabled SSL both for quorum communication and
> client connections:
>
> sslQuorum=true
> serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> ssl.quorum.keyStore.location=/path/to/keystore.jks
> ssl.quorum.keyStore.password=mypassword
> ssl.quorum.trustStore.location=/path/to/truststore.jks
> ssl.quorum.trustStore.password=mypassword
>
> ssl.keyStore.location=/path/to/keystore.jks
> ssl.keyStore.password=mypassword
> ssl.trustStore.location=/path/to/truststore.jks
> ssl.trustStore.password=mypassword
>
> If I subsequently edit the contents of the keystore or the truststore
> file, do I need to restart ZooKeeper for the change to take effect?
>
> (Apache ZooKeeper version 3.6.3)
>

Re: Help With zookeeper follower connections

2022-02-24 Thread Szalay-Bekő Máté

I think the "stat" command should list all ZooKeeper client connections
that are handled by the given ZooKeeper server.

I'm not sure if the ZooKeeper server-server connections are listed there.
(it is based on the result of the ServerCnxFactory.getAllConnectionInfo()
method, which should return client connection info)

But I didn't dig deep, maybe missed something.

On Wed, Feb 23, 2022 at 3:23 PM Jason Grammenos
 wrote:

> Hello,
>
> I did some more digging after Enrico's last reply (netstat) and it appears
> I may be miss interpreting what the "stat" command output is telling me.
> Does the stat command list all connections or only connections from
> followers/leaders?
>
>
> Jason Grammenos | Operations and Infrastructure Analyst
> Pronouns: he/him
> P: 613.232.7797 x1131
> Toll-free: 866.545.3745 x1131
> jason.gramme...@agilitypr.com
> agilitypr.com
> Learn new PR tips from our free resources.
>
> -Original Message-
> From: Enrico Olivelli 
> Sent: February 23, 2022 8:30 AM
> To: UserZooKeeper 
> Subject: Re: Help With zookeeper follower connections
>
> Are you able to verify, using netstat, if those connections are from the
> ZooKeeper process?
> You can compare the TCP port
>
> Enrico
>
> Il Mer 23 Feb 2022, 14:15 Jason Grammenos
>  ha scritto:
>
> > I do have tools that could be opening connections. I have akhq
> > (akhq.io) running on two of the hosts [host01, host04 (two nodes for
> > redundancy reasons)], connecting to the kafka cluster. I also have
> > telegraph running the jolokia2 agent plugin to pull monitoring/stats
> > data from the hosts jmx (for both kafka and zookeeper)
> >
> >
> > Jason Grammenos | Operations and Infrastructure Analyst
> > Pronouns: he/him
> > P: 613.232.7797 x1131
> > Toll-free: 866.545.3745 x1131
> > jason.gramme...@agilitypr.com
> > agilitypr.com
> > Learn new PR tips from our free resources.
> >
> > -Original Message-
> > From: Enrico Olivelli 
> > Sent: February 23, 2022 8:08 AM
> > To: jason.gramme...@agilitypr.com.invalid
> > Cc: UserZooKeeper 
> > Subject: Re: Help With zookeeper follower connections
> >
> > Jason,
> > Do you have other tools on those machines that could open client
> > connections?
> >
> > Enrico
> >
> > Il Mer 23 Feb 2022, 13:30 Jason Grammenos
> >  ha scritto:
> >
> > > Hello,
> > >
> > >
> > >
> > > I have a 5 node zookeeper (+ kafka) cluster. I am trying to find out
> > > if the inter node connection behaviour is normal.
> > >
> > > I should have 1 zookeeper leader and 4 followers, and I expect that
> > > all 4 followers would open connections exclusively to the leader.
> > >
> > > So that “echo stat | nc localhost 2181” on the leader should show me
> > > 6
> > > connections: 4 from the followers and 2 from itself (one via proper
> > > ip, one from localhost)
> > >
> > >
> > >
> > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315,
> > > built on 05/04/2020 15:53 GMT
> > >
> > > Clients:
> > >
> > > /host02:48610[1](queued=0,recved=1455831,sent=1455833)
> > >
> > > /host03:36244[1](queued=0,recved=1459769,sent=1459793)
> > >
> > > /host05:53680[1](queued=0,recved=484,sent=484)
> > >
> > > /host01:48978[1](queued=0,recved=226,sent=226)
> > >
> > > /host04:44810[1](queued=0,recved=52,sent=52)
> > >
> > > /127.0.0.1:41434[0](queued=0,recved=1,sent=0)
> > >
> > >
> > >
> > > But instead I have ended up with a few connections on each node,
> > > including the followers (as seen in below output). Some followers
> > > are even connecting to themselves? The cluster looks stable, with
> > > only 1 leader and 4 followers, but I just do not understand this
> > > connection behaviour and would like to understand if this is normal
> > > behaviour, mis configuration, a bug or something else.
> > >
> > >
> > >
> > > user@host01:~$  echo stat |
> > > nc localhost 2181
> > >
> > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315,
> > > built on 05/04/2020 15:53 GMT
> > >
> > > Clients:
> > >
> > > /host01:50768[1](queued=0,recved=6156,sent=6156)
> > >
> > > /127.0.0.1:35508[0](queued=0,recved=1,sent=0)
> > >
> > >
> > >
> > > Latency min/avg/max: 0/0/3
> > >
> > > Received: 9855
> > >
> > > Sent: 9854
> > >
> > > Connections: 2
> > >
> > > Outstanding: 0
> > >
> > > Zxid: 0xb0068
> > >
> > > Mode: follower
> > >
> > > Node count: 226
> > >
> > >
> > >
> > > user@host02:~$  echo stat |
> > > nc localhost 2181
> > >
> > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315,
> > > built on 05/04/2020 15:53 GMT
> > >
> > > Clients:
> > >
> > > /127.0.0.1:49624[0](queued=0,recved=1,sent=0)
> > >
> > >
> > >
> > > Latency min/avg/max: 0/0/0
> > >
> > > Received: 3699
> > >
> > > Sent: 3698
> > >
> > > Connections: 1
> > >
> > > Outstanding: 0
> > >
> > > Zxid: 0xb0068
> > >
> > > Mode: follower
> > >
> > > Node count: 226
> > >
> > >
> > >
> > > usaer@host03:~$  echo stat |
> > > nc localhost 2181
> > >
> > > Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315,
> > > built on

Re: zookeeper digest authentication

2021-12-15 Thread Szalay-Bekő Máté

maybe using "sessionRequireClientSASLAuth" instead of
"requireClientAuthScheme"?

I don't see in the documentation any config with the name
"requireClientAuthScheme".

Also I think the "zookeeper.allowSaslFailedClients" needs to be specified
as a system property and not as a zoo.cfg parameter. But according to the
documentation, "When enforce.auth.enabled=true and
enforce.auth.schemes=sasl then zookeeper.allowSaslFailedClients
configuration is overruled", and also: "sessionRequireClientSASLAuth: (...)
This configuration is short hand for enforce.auth.enabled=true and
enforce.auth.scheme=sasl", so I think you don't need to
specify zookeeper.allowSaslFailedClients is you
set sessionRequireClientSASLAuth=true in the zoo.cfg.

I hope sessionRequireClientSASLAuth=true will do the trick. But I'm not
sure. These configs are not very intuitive to follow - they more like
evolved instead of being designed :)

On Wed, Dec 15, 2021 at 11:49 AM Andrzej Trzeciak <
andrzej.trzec...@exelaonline.com> wrote:

> Hi,
> first of all thank you Máté and Chris for coming back to me with support.
> I wanted to inform you that I did use the documentation from the link
> provided by Máté and I did use the option 'enforce.auth.enabled=true', yet
> I was still being authenticated. After Chris wrote about
> 'zookeeper.allowSaslFailedClients' I found a Jira issue on that subject
> https://issues.apache.org/jira/browse/ZOOKEEPER-1736
> However I copied the configuration as described in that issue and I am
> still successfully authenticating with the wrong credentials.
> The config I am now using is (copied from Jira issue)
> zoo.cfg:
> requireClientAuthScheme=sasl
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> jaasLoginRenew=360
> zookeeper.allowSaslFailedClients=false
>
> jaasFile.conf
>
> Server {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_admin="admin";
> };
> Client {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="admin"
> password="admin";
> };
>
> Do you maybe have an example config for that handy?
> Kind regards,
> Andrzej
>
> -Original Message-
> From: Chris T. 
> Sent: Wednesday, December 15, 2021 8:19 AM
> To: user@zookeeper.apache.org
> Subject: Re: zookeeper digest authentication
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe. Please contact suspicious.ema...@exelaonline.com
> with any concerns.
>
>
> Hi,
> I think you are referring to
>  zookeeper.allowSaslFailedClients
> This is casually mentioned in the link you provided but not explained as a
> standalone option.
> Regards
> Chris
>
>
>
> On 15 December 2021 08:14:19 Szalay-Bekő Máté 
> wrote:
>
> > Hello Andrzej,
> >
> > In ZooKeeper, the authentication is not enforced by default, meaning
> > that even if you fail to authenticate (or don't even provide any
> > credentials) you can still connect to ZooKeeper, but your session
> > won't have any user attached to it. So you will be able to see/modify
> > only the ZNodes that are granting permission to the "world" user.
> > There are several server side options to change this behaviour. I
> > think you are looking for the "enforce.auth.enabled=true" option, see
> here:
> > https://urldefense.com/v3/__https://zookeeper.apache.org/doc/r3.7.0/zo
> > okeeperAdmin.html*sc_authOptions__;Iw!!NCEDZeEw!u7G2JZg8FqgI70GySY1GFH
> > 2nZr8CpzzXIQgzyzIn7HUwTNrmLNj9u2Szwehx8YVZBF8Fsc-jvw$
> >
> > (I remember there is some other option, which will disable the
> > "fallback to world user" behaviour (so terminating the session if you
> > connect with wrong credentials, but still let you connect without
> providing any credentials).
> > I remember seeing this in the code, but don't see it in the
> documentation.
> > If you would need this one, I can dig deeper.
> >
> > Kind regards,
> > Máté
> >
> > On Tue, Dec 14, 2021 at 2:20 PM Andrzej Trzeciak <
> > andrzej.trzec...@exelaonline.com> wrote:
> >
> >> Hi,
> >>
> >> I’m having trouble implementing the simplest zookeeper (v 3.7.0)
> >> authentication using just username and password and the ‘digest’
> mechanism.
> >>
> >> I tried various config properties, but none of them worked.
> >>
> >> The problem is, that when I connect giving the wrong credentials I am
> >> still being successfully authenticated instead of being

Re: Unable to load database on disk

2021-12-14 Thread Szalay-Bekő Máté

I think the "unreasonable length" in your case means that the integer
describing the size of the next transaction event contains an "unreasonable
large" value (2186882 bytes), so I would assume this is a corrupted
transaction log file.

If this is a distributed cluster, then you should be able to restore the
failing node by copying the data from an other server (e.g. from the
current leader). Make sure you copy both the latest snapshot, plus all the
transaction logs written since the start of the last snapshot creation. (it
never hurts if you copy more files, ZooKeeper will only use the files that
are necessary)

Kind regards,
Máté

On Tue, Dec 14, 2021 at 5:15 PM Joe (Joseph) Marrero 
wrote:

> Hi.
>
> Yesterday, our ZooKeeper instances failed to recover and we saw this error
> in the logs:
>
>
> 2021-12-14 03:01:43,296 [myid:2] - ERROR [main:QuorumPeer@1139] - Unable
> to load database on disk
>
> java.io.IOException: Unreasonable length = 2186882
>
> at
> org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166)
>
> at
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127)
>
> at
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:768)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303)
>
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1093)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1078)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
>
> 2021-12-14 03:01:43,304 [myid:2] - INFO [main:AbstractConnector@380] -
> Stopped ServerConnector@7526515b{HTTP/1.1,[http/1.1]}{0.0.0.0:9141}
>
> 2021-12-14 03:01:43,305 [myid:2] - INFO [main:ContextHandler@1016] -
> Stopped o.e.j.s.ServletContextHandler@2f953efd{/,null,UNAVAILABLE}
>
> 2021-12-14 03:01:43,306 [myid:2] - ERROR [main:QuorumPeerMain@113] -
> Unexpected exception, exiting abnormally
>
> java.lang.RuntimeException: Unable to run quorum server
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1140)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1078)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:227)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:136)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:90)
>
> Caused by: java.io.IOException: Unreasonable length = 2186882
>
> at
> org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166)
>
> at
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:127)
>
> at
> org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:159)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:768)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:352)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:258)
>
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:303)
>
> at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:285)
>
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1093)
>
> ... 4 more
>
>
> The hard disk was at 50% capacity when this failure occurred. Any ideas
> what would cause this kind of unrecoverable error?
>
>
>
> *[image: DrFirst Logo]*
>
> *Joe Marrero*
>
> Senior Software Engineer
>
> jmarr...@drfirst.com 
>
> DrFirst.com  | Twitter
>  | Facebook
>  | LinkedIn
> 
>
> [image: HIMSS Banner]
>
>
> Notice of Confidentiality: The information included and/or attached in
> this

Re: zookeeper digest authentication

2021-12-14 Thread Szalay-Bekő Máté

Hello Andrzej,

In ZooKeeper, the authentication is not enforced by default, meaning that
even if you fail to authenticate (or don't even provide any credentials)
you can still connect to ZooKeeper, but your session won't have any user
attached to it. So you will be able to see/modify only the ZNodes that are
granting permission to the "world" user. There are several server side
options to change this behaviour. I think you are looking for the
"enforce.auth.enabled=true" option, see here:
https://zookeeper.apache.org/doc/r3.7.0/zookeeperAdmin.html#sc_authOptions

(I remember there is some other option, which will disable the "fallback to
world user" behaviour (so terminating the session if you connect with wrong
credentials, but still let you connect without providing any credentials).
I remember seeing this in the code, but don't see it in the documentation.
If you would need this one, I can dig deeper.

Kind regards,
Máté

On Tue, Dec 14, 2021 at 2:20 PM Andrzej Trzeciak <
andrzej.trzec...@exelaonline.com> wrote:

> Hi,
>
> I’m having trouble implementing the simplest zookeeper (v 3.7.0)
> authentication using just username and password and the ‘digest’ mechanism.
>
> I tried various config properties, but none of them worked.
>
> The problem is, that when I connect giving the wrong credentials I am
> still being successfully authenticated instead of being rejected.
>
> My setup below (including oprions I have tried, but didn’t work, so I
> commented them:
>
> *Zoo.cfg:*
>
>
> #SASL
>
>
>
>
> #authProvider.sasl=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
>
> #authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
>
> #requireClientAuthScheme=sasl
>
> #sessionRequireClientSASLAuth=true
>
> #set.acl=true
>
> DigestAuthenticationProvider.enabled=true
>
> enforce.auth.enabled=true
>
> enforce.auth.schemes=digest
>
> #SASL
> END--
>
>
>
> *Jaas_config:*
>
> Server {
>
>org.apache.zookeeper.server.auth.DigestLoginModule required
>
>user_super="adminsecret"
>
>user_bob="bobsecret";
>
> };
>
> *Client code:*
>
> CuratorFrameworkFactory.Builder builder = CuratorFrameworkFactory.builder()
>
> .connectString(connectUris(zookeeper, "zookeeper:2181"))
>
> .connectionStateErrorPolicy(connectionStateErrorPolicy)
>
> .retryPolicy(retryPolicy)
>
> .aclProvider(aclProvider)
>
> .connectionTimeoutMs(1)
>
> .sessionTimeoutMs(sessionTimeout);
>
> if(zookeeperAuthEnabled){
>
> builder.authorization("digest",
> "kuku:adminsecret4".getBytes());
>
> }
>
> curatorClient = builder.build();
>
> curatorClient.getConnectionStateListenable().addListener((c, s) ->
> {
>
> connectionState = s;
>
> log.info(MessageFormat.format("CuratorState
> [State={0},Connected={1}]", s.name(), s.isConnected()));
>
> });
>
> curatorClient.start();
>
> try {
>
> curatorClient.blockUntilConnected();
>
> leaderLatch = initLeadership();
>
> } catch (InterruptedException e) {
>
> log.info(e);
>
> }
>
>
>
> As a result, when the application starts I a successful authentication and
> a message in zookeeper console:
>
> 2021-12-14 14:08:45,854 [myid:] - INFO
> [NIOWorkerThread-13:ZooKeeperServer@1623] - got auth packet /
> 192.168.43.169:49753
>
> 2021-12-14 14:08:45,854 [myid:] - INFO
> [NIOWorkerThread-13:ZooKeeperServer@1642] - Session 0x1004d2f28d1:
> auth success for scheme digest and address /192.168.43.169:49753
>
>
>
>
>
> *Andrzej Trzeciak*
> Senior System Engineer
> [image: Exela Technologies]
> 
>
> Grudziądzka 46-48 • 87-100 Toruń • Poland
>
> Tel. +48 573 251 507
> exelatech.com
> 
>   •  About EXELA
> 
>   •  Instagram
> 
>   •  LinkedIn
>

Re: Any known issues in java client with zookeeper version 3.4.14

2021-11-28 Thread Szalay-Bekő Máté

Hello Ram,

please note, that ZooKeeper 3.4 is EOL now, it is highly recommended to
upgrade to the latest 3.5/3.6 version.

I didn't know about any such issue you are mentioning, but might worth
checking in Jira (
https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-4415?filter=allopenissues
)
I found this one, but it was fixed in 3.4.6:
https://issues.apache.org/jira/browse/ZOOKEEPER-1702

Kind regards,
Máté

On Thu, Nov 25, 2021 at 6:56 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> We recently observed that java clients (zkCli) in some of the environments
> (RHEL 7 and 8) are sending connection RST even before session/connection
> timeout, so I am wondering if there are any known issues.
>
> Thanks,
> Ram
>

Re: Configure zookeeper to accept plaintext connections and connections with kerberos authentication

2021-07-08 Thread Szalay-Bekő Máté

Just one note: I think the kerberos authentication is still "plain text".
If you need wire encryption, then you also need to enable SSL. (you can do
that in a dual mode too, so you can run a ZooKeeper server with both
plain-text and SSL ports open. Or even you can use the same port for both
plain-text and SSL communication, see config parameter
"zookeeper.client.portUnification")

On Thu, Jul 8, 2021 at 6:24 PM Szalay-Bekő Máté 
wrote:

> Hello Dene,
>
> Currently if you enable Kerberos authentication, then the clients are
> still able to connect to ZooKeeper without any authentication. Of course
> they won't be able to access / change any ZNodes protected by ACLs, but
> they can join and will be authenticated automatically as "word:anyone" and
> will be able to read / modify any ZNode where you haven't configured any
> ACL.
>
> You can enforce authentication (maybe using this zoo.cfg property? 
> zookeeper.sessionRequireClientSASLAuth
> see here:
> https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#ch_administration
> ) but AFAIK this feature is disabled by default. So you should be good to
> make this transition iteratively. But it worths testing :)
>
> Kind regards,
> Mate
>
> On Thu, Jul 8, 2021 at 5:07 PM Hamado Dene 
> wrote:
>
>> Hi Everyone,
>> We will need to switch our existing system using zookeeper (without sasl
>> auth) to kerberos authentication.Since our system is quite large, is it
>> possible to configure zookeeper to accept both plaintext connections and
>> connections with kerberos authentication?
>> If this is possible, it would allow us to program a plan to restart our
>> application, without causing major disservices to customers.
>>  Thanks for your help,
>>
>> Hamado Dene
>
>

Re: Configure zookeeper to accept plaintext connections and connections with kerberos authentication

2021-07-08 Thread Szalay-Bekő Máté

Hello Dene,

Currently if you enable Kerberos authentication, then the clients are still
able to connect to ZooKeeper without any authentication. Of course they
won't be able to access / change any ZNodes protected by ACLs, but they can
join and will be authenticated automatically as "word:anyone" and will be
able to read / modify any ZNode where you haven't configured any ACL.

You can enforce authentication (maybe using this zoo.cfg property?
zookeeper.sessionRequireClientSASLAuth
see here:
https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#ch_administration
) but AFAIK this feature is disabled by default. So you should be good to
make this transition iteratively. But it worths testing :)

Kind regards,
Mate

On Thu, Jul 8, 2021 at 5:07 PM Hamado Dene 
wrote:

> Hi Everyone,
> We will need to switch our existing system using zookeeper (without sasl
> auth) to kerberos authentication.Since our system is quite large, is it
> possible to configure zookeeper to accept both plaintext connections and
> connections with kerberos authentication?
> If this is possible, it would allow us to program a plan to restart our
> application, without causing major disservices to customers.
>  Thanks for your help,
>
> Hamado Dene

Re: [ANNOUNCE] Apache ZooKeeper 3.6.3 Release

2021-04-13 Thread Szalay-Bekő Máté

This is great!! :)
Thank you Arshad for driving this release and thanks to the whole community
for the contribution!

On Tue, Apr 13, 2021 at 4:29 PM Mohammad Arshad  wrote:

> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> 3.6.3
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads, visit:
> https://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.6.3 Release Notes are at:
> https://zookeeper.apache.org/doc/r3.6.3/releasenotes.html
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The ZooKeeper Team
>

Re: [ANNOUNCE] Apache ZooKeeper 3.7.0 released

2021-03-28 Thread Szalay-Bekő Máté

Thanks for all your work Damien, and also for all the contributors and for
the whole community.
It's great to see 3.7.0 out! :)

Cheers,
Mate

On Sun, Mar 28, 2021 at 9:39 AM Damien Diederen 
wrote:

>
> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> 3.7.0.
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads, visit:
> https://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.7.0 Release Notes are at:
> https://zookeeper.apache.org/doc/r3.7.0/releasenotes.html
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
>
> The ZooKeeper Team
>

Re: Zookeeper ensemble not getting reestablished after short network outage

2021-03-04 Thread Szalay-Bekő Máté

> That's one of our main problem with zookeeper also, the election
precedence.

well, there is not much you can do here at the moment. All the voting
members of the quorum have the chance to get elected as leader. You can not
define weights or priorities among the servers. There is some precedence
for the higher SID in the protocol, but it is easy to get a quorum with a
leader who is not being the one with the highest SID. It really depends
only on the timing of the leader election messages and restarts, as far as
I understood.

I'm not sure why you want a given node not to be elected. But one thing
that you might check is the Observer feature. The Observers are special
nodes, which will never get elected as leaders but still they get notified
about all the committed changes (so they have the full in-memory state of
all ZNodes) and they can handle client sessions. Observers are mainly used
for scalability reasons, enabling to have larger ensembles in a more
efficient way. See:
https://zookeeper.apache.org/doc/r3.6.2/zookeeperObservers.html

regarding the comment on the ticket, I'm not entirely sure that the part
about "increase the chances the node with higher SID to be the leader" is
fully correct

> https://issues.apache.org/jira/browse/ZOOKEEPER-2164
> The peer uses this identification to connect back to the ZK server on
0.0.0.0, if its own SID is greater than the SID of the originating node.
> And before it establishes a new connection it, closes the existing
connection to the originating  node if there is any.
> (I believe it does that to increase the chances the node with higher SID
to be the leader.)

So why always the server with higher SID is the one who is initiating the
connection during leader election? I think it is not because we want to
prioritize nodes with higher SIDs to be elected. I always assumed that the
simple reason here is to avoid double connections opened concurrently. We
don't need n*(n-1) channel between the servers, we only need ( n*(n-1))/2.
When the server with a low ID come online, it "pings" the servers with the
higer ID to let them know that it wants to join the quorum. Still, it is
expected that the server with the higher ID will be the one who opens the
long-term channel.

Best regards,
Mate

On Thu, Mar 4, 2021 at 6:30 PM Jhanssen Fávaro 
wrote:

> Mate, looking at the bugs/jira links we saw a curious comment. Would this
> make any sense at all ?
> That's one of our main problem with zookeeper also, the election
> precedence.
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-2164
> The peer uses this identification to connect back to the ZK server on
> 0.0.0.0, if its own SID is greater than the SID of the originating node.
> And before it establishes a new connection it, closes the existing
> connection to the originating node if there is any. (I believe it does that
> to increase the chances the node with higher SID to be the leader.)
>
> We're needing to avoid one of our broker to get leadership... The only
> "smart" way was keeping rebooting that node, every time it gets the
> leadership.
>
> Best Regards!
> Jhanssen
>
>
> On 2021/03/03 18:33:13, Szalay-Bekő Máté 
> wrote:
> > Ok, happy to hear if the information helped.
> > Good luck and enjoy using ZooKeeper ;)
> >
> > Mate
> >
> > On Wed, Mar 3, 2021, 18:18 Jhanssen Fávaro 
> wrote:
> >
> > > Hi Mate, thank you again!
> > > These bug links cleared a lot to us and  will help a lot!!!
> > >
> > > Yeah looks like that we're probably being caught by one of these BUGs.
> > > Anyway, we'll take a closer look in each of the BUGs related by.
> > >
> > > About kubernetes, we're not using. Only clients(application connecting
> to
> > > zookeeper/kafka). Our ZK are running on VMWare VMs.
> > >
> > > And yes, this is very rare to replicated issue, normally only during
> > > VMWare patches/host reboots we face it, and not always.
> > >
> > > We have a SDN defined SNX(VMWare), so a lot of variables on the table.
> > >
> > > Anyway, by now, we just would like to say thank you. And we'll decide
> > > wether open the jira or not, but you'll be notified for sure as soon
> as we
> > > open.
> > >
> > > Thank you!
> > >
> > > On 2021/03/03 08:38:43, Szalay-Bekő Máté 
> > > wrote:
> > > > Thanks for the logs!
> > > >
> > > > I'm far for digging deep enough (unfortunately I don't have much time
> > > right
> > > > now). A few observations:
> > > >
> > > > (1)
> > > > Looks like we have a problem with the order of the leader election
> > > > notifications. Here for

Re: Zookeeper 3.4.5 with client 3.6.2

2021-03-04 Thread Szalay-Bekő Máté

Hi Subhajit,

We (in my company) did some compatibility / smoke tests around 3.4 and 3.5
client-server compatibilities, and we haven't found any issues. Although
the community hasn't tested all the possible use cases and version
combinations.
Still, in general, I think if you don't use any 3.5 specific feature on
your client side, then your 3.5.9 client should work with any 3.4 or 3.5
servers.

Best Regards,
Mate

On Wed, Mar 3, 2021 at 9:28 PM Subhajit Das  wrote:

> Hi Mate,
>
> Thanks for reply. I don’t have the option to switch server right now.
> Is client 3.5.9, compatible with server 3.4.5 and 3.5.5?
>
> From: Szalay-Bekő Máté<mailto:szalay.beko.m...@gmail.com>
> Sent: 02 March 2021 10:58 AM
> To: UserZooKeeper<mailto:user@zookeeper.apache.org>
> Subject: Re: Zookeeper 3.4.5 with client 3.6.2
>
> Hello,
>
> the ZooKeeper 3.4.5 is not supported anymore by the community, please
> upgrade to a more recent ZooKeeper version, like the latest 3.5.x or 3.6.x
> versions (see zookeeper.apache.org).
>
> by the way... I think the EndOfStream issue you mention is harmless. The
> ZooKeeper server is printing out the warnings, but will close the session
> anyway.
>
> Regards,
> Mate
>
> On Mon, Mar 1, 2021 at 4:43 PM Subhajit Das 
> wrote:
>
> >
> > Hi There,
> >
> > I am trying to connect to Zookeeper 3.4.5 with client 3.6.2 (internally
> > with Solr).
> > There seems to an issue. EndOfStream issue is coming, saying client must
> > have closed the connection.
> >
> > Please help on how to resolve the issue.
> >
> > Thanks in advance.
> >
>
>

Re: Zookeeper ensemble not getting reestablished after short network outage

2021-03-03 Thread Szalay-Bekő Máté

Ok, happy to hear if the information helped.
Good luck and enjoy using ZooKeeper ;)

Mate

On Wed, Mar 3, 2021, 18:18 Jhanssen Fávaro  wrote:

> Hi Mate, thank you again!
> These bug links cleared a lot to us and  will help a lot!!!
>
> Yeah looks like that we're probably being caught by one of these BUGs.
> Anyway, we'll take a closer look in each of the BUGs related by.
>
> About kubernetes, we're not using. Only clients(application connecting to
> zookeeper/kafka). Our ZK are running on VMWare VMs.
>
> And yes, this is very rare to replicated issue, normally only during
> VMWare patches/host reboots we face it, and not always.
>
> We have a SDN defined SNX(VMWare), so a lot of variables on the table.
>
> Anyway, by now, we just would like to say thank you. And we'll decide
> wether open the jira or not, but you'll be notified for sure as soon as we
> open.
>
> Thank you!
>
> On 2021/03/03 08:38:43, Szalay-Bekő Máté 
> wrote:
> > Thanks for the logs!
> >
> > I'm far for digging deep enough (unfortunately I don't have much time
> right
> > now). A few observations:
> >
> > (1)
> > Looks like we have a problem with the order of the leader election
> > notifications. Here for example we get the message from round 0xf8 before
> > the one from the previous round (0xf7):
> > I would assume the leader election protocol should handle this (although
> I
> > haven't checked this).
> >
> > 2021-02-27 11:42:39,432] INFO Notification: 2 (message format version),
> 102
> > (n.leader), 0x280866 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 102
> > (n.sid), 0x28 (n.peerEPoch), FOLLOWING (my state)0 (n.config
> > version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
> > [2021-02-27 11:42:39,434] INFO Notification: 2 (message format
> > version), 103 (n.leader), 0x270611 (n.zxid), 0xf7 (n.round),
> FOLLOWING
> > (n.state), 102 (n.sid), 0x28 (n.peerEPoch), FOLLOWING (my state)0
> (n.config
> > version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
> >
> > We saw cases when the kubernetes networking mesh was slow to propagate a
> > socket close between quorum members and this caused problems. I wonder if
> > this would be some similar issue...
> > If this is indeed some Kubernetes specific issue, then it will be hard to
> > reproduce. But we can give a try.
> >
> > (2)
> > In the leader election started on 2021-02-28 09:11:38,133, we see that
> the
> > server (id=111) sent out notifications, but haven't received any from the
> > other servers. We should check the logs of the other two servers, check
> if
> > they even get the notifications.
> >
> > We saw cases (e.g. ZOOKEEPER-3769 fixed in 3.5.8) when the other server
> > hasn't received the notification due to some receiver thread dying. E.g.
> we
> > could check the other servers for log message:
> > 03/24/20 11:16:16,297 [WorkerReceiver[myid=1]] ERROR
> > [org.apache.zookeeper.server.NIOServerCnxnFactory]
> > (NIOServerCnxnFactory.java:92) - Thread
> > Thread[WorkerReceiver[myid=1],5,main] died
> >
> > There were some other related changes in more recent ZooKeeper versions
> > (ZOOKEEPER-3756, ZOOKEEPER-2164). So it is possible that if you upgrade
> to
> > a more recent ZooKeeper version, then you won't see this problem again.
> But
> > it is hard to tell, until we found out what exactly happened.
> >
> > I think it would be good to create a new Jira ticket for this. Could you
> > maybe create it, and also attach all the logs from the three ZooKeeper
> > servers since the last restart? Also, can you attach the ZooKeeper
> > configuration?
> >
> > Please ping me on the ticket or assign it to me and I'm happy to check
> this
> > further. (although I will be slow due to other projects)
> >
> > Best Regards,
> > Mate
> >
> > On Tue, Mar 2, 2021 at 12:51 PM Jhanssen Fávaro <
> jhanssenfav...@gmail.com>
> > wrote:
> >
> > > Hi, Mate! Thanks by the reply!
> > >
> > > > Which ZooKeeper version are youusing?
> > > This is the current ZK Version we're using, "*INFO Server
> > > environment:zookeeper.version=3.5.7*". We're still planning the
> migration
> > > to 3.6 in a few weeks.
> > >
> > > > 2021-02-28 09:11:38,119 and after the previous successful leader
> > > election?
> > > Sure, follow the logs, but after our last restart (  [2021-02-18
> > > 11:38:34,316]  ) there was only one exception/wan(Sorry by the flooding
> > > logs):
> > >
> > >
>

Re: Zookeeper ensemble not getting reestablished after short network outage

2021-03-03 Thread Szalay-Bekő Máté

Thanks for the logs!

I'm far for digging deep enough (unfortunately I don't have much time right
now). A few observations:

(1)
Looks like we have a problem with the order of the leader election
notifications. Here for example we get the message from round 0xf8 before
the one from the previous round (0xf7):
I would assume the leader election protocol should handle this (although I
haven't checked this).

2021-02-27 11:42:39,432] INFO Notification: 2 (message format version), 102
(n.leader), 0x280866 (n.zxid), 0xf8 (n.round), LOOKING (n.state), 102
(n.sid), 0x28 (n.peerEPoch), FOLLOWING (my state)0 (n.config
version) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2021-02-27 11:42:39,434] INFO Notification: 2 (message format
version), 103 (n.leader), 0x270611 (n.zxid), 0xf7 (n.round), FOLLOWING
(n.state), 102 (n.sid), 0x28 (n.peerEPoch), FOLLOWING (my state)0 (n.config
version) (org.apache.zookeeper.server.quorum.FastLeaderElection)

We saw cases when the kubernetes networking mesh was slow to propagate a
socket close between quorum members and this caused problems. I wonder if
this would be some similar issue...
If this is indeed some Kubernetes specific issue, then it will be hard to
reproduce. But we can give a try.

(2)
In the leader election started on 2021-02-28 09:11:38,133, we see that the
server (id=111) sent out notifications, but haven't received any from the
other servers. We should check the logs of the other two servers, check if
they even get the notifications.

We saw cases (e.g. ZOOKEEPER-3769 fixed in 3.5.8) when the other server
hasn't received the notification due to some receiver thread dying. E.g. we
could check the other servers for log message:
03/24/20 11:16:16,297 [WorkerReceiver[myid=1]] ERROR
[org.apache.zookeeper.server.NIOServerCnxnFactory]
(NIOServerCnxnFactory.java:92) - Thread
Thread[WorkerReceiver[myid=1],5,main] died

There were some other related changes in more recent ZooKeeper versions
(ZOOKEEPER-3756, ZOOKEEPER-2164). So it is possible that if you upgrade to
a more recent ZooKeeper version, then you won't see this problem again. But
it is hard to tell, until we found out what exactly happened.

I think it would be good to create a new Jira ticket for this. Could you
maybe create it, and also attach all the logs from the three ZooKeeper
servers since the last restart? Also, can you attach the ZooKeeper
configuration?

Please ping me on the ticket or assign it to me and I'm happy to check this
further. (although I will be slow due to other projects)

Best Regards,
Mate

On Tue, Mar 2, 2021 at 12:51 PM Jhanssen Fávaro 
wrote:

> Hi, Mate! Thanks by the reply!
>
> > Which ZooKeeper version are youusing?
> This is the current ZK Version we're using, "*INFO Server
> environment:zookeeper.version=3.5.7*". We're still planning the migration
> to 3.6 in a few weeks.
>
> > 2021-02-28 09:11:38,119 and after the previous successful leader
> election?
> Sure, follow the logs, but after our last restart (  [2021-02-18
> 11:38:34,316]  ) there was only one exception/wan(Sorry by the flooding
> logs):
>
>
>
> *#All messages
> before these were related to Purge task started / Purge task completed.
> #*
> [2021-02-27 08:38:33,007] INFO Purge task completed.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 09:38:33,005] INFO Purge task started.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 09:38:33,006] INFO zookeeper.snapshot.trust.empty : false
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
> [2021-02-27 09:38:33,006] INFO Purge task completed.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 10:38:33,006] INFO Purge task started.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 10:38:33,006] INFO zookeeper.snapshot.trust.empty : false
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
> [2021-02-27 10:38:33,007] INFO Purge task completed.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 11:38:33,005] INFO Purge task started.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 11:38:33,006] INFO zookeeper.snapshot.trust.empty : false
> (org.apache.zookeeper.server.persistence.FileTxnSnapLog)
> [2021-02-27 11:38:33,006] INFO Purge task completed.
> (org.apache.zookeeper.server.DatadirCleanupManager)
> [2021-02-27 11:42:39,139] INFO Received connection request from /
> 10.1.1.93:43572 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
> [2021-02-27 11:42:39,293] INFO Accepted TLS connection from
> zk-xpto-b7.production.com/10.1.1.93:43572 - TLSv1.2 -
> TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
> (org.apache.zookeeper.server.quorum.UnifiedServerSocket)
> [2021-02-27 11:42:39,298] INFO Successfully authenticated learner:
> authenticationID=quorum-zkxpto-prod;  authorizationID=quorum-zkxpto-prod.
>

Re: Zookeeper 3.4.5 with client 3.6.2

2021-03-01 Thread Szalay-Bekő Máté

Hello,

the ZooKeeper 3.4.5 is not supported anymore by the community, please
upgrade to a more recent ZooKeeper version, like the latest 3.5.x or 3.6.x
versions (see zookeeper.apache.org).

by the way... I think the EndOfStream issue you mention is harmless. The
ZooKeeper server is printing out the warnings, but will close the session
anyway.

Regards,
Mate

On Mon, Mar 1, 2021 at 4:43 PM Subhajit Das  wrote:

>
> Hi There,
>
> I am trying to connect to Zookeeper 3.4.5 with client 3.6.2 (internally
> with Solr).
> There seems to an issue. EndOfStream issue is coming, saying client must
> have closed the connection.
>
> Please help on how to resolve the issue.
>
> Thanks in advance.
>

Re: Zookeeper ensemble not getting reestablished after short network outage

2021-03-01 Thread Szalay-Bekő Máté

>  it was supposed to get back "automatically" right ?
absolutely

I haven't seen this exact problem before. Which ZooKeeper version are you
using?

Can you maybe check the logs for any earlier exceptions / errors before
2021-02-28 09:11:38,119 and after the previous successful leader election?
One possibility is that one of the threads responsible for the leader
election communication (e.g. sending out notifications or receiving the
answers) were killed and never restarted in this ZooKeeper Server. If this
would be the case, we should see a reason / stack trace earlier explaining
the unhandled exception, and we could fix this in the code.

I saw somewhat similar cases, when the current leader had the same issue
with threads that are receiving the leader election notifications. But in
that case no one was able to rejoin the quorum and the leader needed to be
restarted. Your case is different, as restarting a single follower solved
the issue.

Regards,
Mate

On Mon, Mar 1, 2021 at 8:26 PM Jhanssen Fávaro 
wrote:

> Hi all, yesterday during a short VMWare Patching apply, there a was a very
> short network outage, something about 3/5 seconds, but for one of our
> ZooKeeper node it didn't come back automatically.
> Our ensemble cluster has 5 members and only one got this kind of behavior.
> I needed to manually restart it to get it back.  In this case this
> member/zk was not receving connections from kafka, it's only a repository
> backup, but anyway it was supposed to get back "automatically" right ?
> It was trying to call the leader, but only this got this kind of behavior,
> all the others nodes got the ellection process and reestablished the
> ensemble.
> Bellow there is a log's chunk:
>
>
>
>
>
> [2021-02-28 09:11:38,119] WARN Exception when following the leader
> (org.apache.zookeeper.server.quorum.Learner)
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:84)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:118)
> at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:158)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:92)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1253)
> [2021-02-28 09:11:38,119] WARN Closing connection to leader, exception
> during packet send
> (org.apache.zookeeper.server.quorum.SendAckRequestProcessor)
> javax.net.ssl.SSLException: Connection or outbound has been closed
> at sun.security.ssl.Alert.createSSLException(Alert.java:127)
> at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
> at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
> at
> sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
> at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:979)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:144)
> at
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:186)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> Caused by: java.net.SocketException: Connection or outbound has been closed
> at
> sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:267)
> at
> sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:974)
> ... 6 more
> [2021-02-28 09:11:38,120] INFO shutdown called
> (org.apache.zookeeper.server.quorum.Learner)
> java.lang.Exception: shutdown Follower
> at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:201)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1257)
> [2021-02-28 09:11:38,120] INFO Shutting down
> (org.apache.zookeeper.server.ZooKeeperServer)
> [2021-02-28 09:11:38,120] INFO shutting down
> (org.apache.zookeeper.server.ZooKeeperServer)
> [2021-02-28 09:11:38,120] INFO Shutting down
> (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
> [2021-02-28 09:11:38,121] INFO Shutting down
> (org.apache.zookeeper.server.quorum.CommitProcessor)
> [2021-02-28 09:11:38,121] INFO FollowerRequestProcessor exited loop!
> (org.apache.zookeeper.server.quorum.FollowerRequestProcessor)
> [2021-02-28 09:11:38,121] INFO CommitProcessor exited loop!
> (org.apache.zookeeper.server.quorum.CommitProcessor)
> [2021-02-28 09:11:38,123] INFO shutdown of request processor

Re: 3.4.6: Unable to read additional data from client sessionid 0xABC, likely client has closed socket

2021-01-18 Thread Szalay-Bekő Máté

resending my answer (it was sent originally to jlindw...@yahoo.com.invalid
- I wonder why)

On Mon, Jan 18, 2021 at 9:12 AM Szalay-Bekő Máté 
wrote:

> Hi John,
>
> > Could an excessive number/size of znodes be a factor?
> I don't think this would be a likely case... more like a client error I
> think.
>
> The stack-trace suggests that the client closed the TCP session
> unexpectedly. Maybe the clients are missing the zookeeper.close() calls?
>
> Also I wonder if it is caused by
> https://issues.apache.org/jira/browse/ZOOKEEPER-1105 - are you using C
> (or python) client?
> (there was this bug, that the C client didn't wait for the proper session
> close to finish)
>
>
> Btw: please consider upgrading your cluster, 3.4 is end-of-life now and it
> is not supported by the community anymore.
>
> Cheers,
> Mate
>
> On Thu, Jan 14, 2021 at 7:55 PM John Lindwall 
> wrote:
>
>> We're seeing thousands of these a day in our zookeeper logs (zookeeper
>> 3.4.6):
>> WARN  [NIOServerCxn.Factory:X.X.X.X/X.X.X.X:Y:NIOServerCnxn@357] -
>> caught end of stream exceptionEndOfStreamException: Unable to read
>> additional data from client sessionid 0xABC, likely client has closed socket
>> at
>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>> at
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Any ideas of the cause of this?  Could an excessive number/size of znodes
>> be a factor? We're not seeing any obvious client-side issues. We're not
>> sure but we believe that these were not happening earlier.
>>
>> -- John
>
>

Re: [ANNOUNCE] Apache ZooKeeper 3.5.9

2021-01-18 Thread Szalay-Bekő Máté

Thank you Norbert for driving this! :)

Regards,
Mate

On Fri, Jan 15, 2021 at 4:04 PM Norbert Kalmar  wrote:

> The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
> 3.5.9
>
> ZooKeeper is a high-performance coordination service for distributed
> applications. It exposes common services - such as naming,
> configuration management, synchronization, and group services - in a
> simple interface so you don't have to write them from scratch. You can
> use it off-the-shelf to implement consensus, group management, leader
> election, and presence protocols. And you can build on it for your
> own, specific needs.
>
> For ZooKeeper release details and downloads,
> visit:https://zookeeper.apache.org/releases.html
>
> ZooKeeper 3.5.9 Release Notes are
> at:https://zookeeper.apache.org/doc/r3.5.9/releasenotes.html
>
> We would like to thank the contributors that made the release possible.
>
> Regards,
> The ZooKeeper Team
>

Re: 3.4.6: Unable to read additional data from client sessionid 0xABC, likely client has closed socket

2021-01-18 Thread Szalay-Bekő Máté

Hi John,

> Could an excessive number/size of znodes be a factor?
I don't think this would be a likely case... more like a client error I
think.

The stack-trace suggests that the client closed the TCP session
unexpectedly. Maybe the clients are missing the zookeeper.close() calls?

Also I wonder if it is caused by
https://issues.apache.org/jira/browse/ZOOKEEPER-1105 - are you using C (or
python) client?
(there was this bug, that the C client didn't wait for the proper session
close to finish)

Btw: please consider upgrading your cluster, 3.4 is end-of-life now and it
is not supported by the community anymore.

Cheers,
Mate

On Thu, Jan 14, 2021 at 7:55 PM John Lindwall 
wrote:

> We're seeing thousands of these a day in our zookeeper logs (zookeeper
> 3.4.6):
> WARN  [NIOServerCxn.Factory:X.X.X.X/X.X.X.X:Y:NIOServerCnxn@357] - caught
> end of stream exceptionEndOfStreamException: Unable to read additional data
> from client sessionid 0xABC, likely client has closed socket
> at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> at java.lang.Thread.run(Thread.java:745)
>
> Any ideas of the cause of this?  Could an excessive number/size of znodes
> be a factor? We're not seeing any obvious client-side issues. We're not
> sure but we believe that these were not happening earlier.
>
> -- John

Re: TLS certificate error does not return a error to client

2021-01-05 Thread Szalay-Bekő Máté

This sounds like a bug indeed... I think you should create a Jira ticket
for this.
I agree with Benjamin Reed that you should start with adding a new test
case in the C client. We already have tests for C client using SSL to
connect to the server (
https://github.com/apache/zookeeper/blob/701e134dfba721356deac1a20aa80e94ec80484a/zookeeper-client/zookeeper-client-c/tests/TestClient.cc#L882-L890),
you can start from this one.

In these tests we are using some dummy certificate / key files, you could
modify this file to generate some invalid certificates:
https://github.com/apache/zookeeper/blob/master/zookeeper-client/zookeeper-client-c/ssl/gencerts.sh


Also, make sure to set (keep the default values) in zoo.cfg for the
following parameters: client.portUnification=false and ssl.clientAuth=need
(see https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html )

Best Regards,
Mate


On Tue, Jan 5, 2021 at 10:46 AM Martin Gainty  wrote:

> Unfortunately i cant help you..good luck
>
> 
> From: Dipti Mulay 
> Sent: Sunday, January 3, 2021 7:29 PM
> To: user@zookeeper.apache.org 
> Subject: Re: TLS certificate error does not return a error to client
>
> Hi Martin,
>
> I am using the c-client and not Java.
>
> Thanks
> -Dipti
>
> On 1/4/21, 5:27 AM, "Martin Gainty"  wrote:
>
> you will need ssl debugging turned on at jvm invocation
>
> 
> From: Benjamin Reed 
> Sent: Sunday, January 3, 2021 1:30 PM
> To: user@zookeeper.apache.org 
> Subject: Re: TLS certificate error does not return a error to client
>
> it sounds like we might be missing a test case. do we not have test
> case coverage for this one?
>
> ben
>
> On Fri, Jan 1, 2021 at 8:32 PM Dipti Mulay  wrote:
> >
> > Hi All,
> >
> > I have been using a zookeeper C-client libraries to communicate with
> the Zookeeper Cluster(Ensemble).
> > The communication is set to be established using mTLS.
> >
> > While running some tests I had an incorrect certificate installed on
> the client side. I was expecting that the library would return a error
> indication AUTH failure or an callback session even indicating a failure.
> > But it seems the no error or callback is returned in this case. I
> see the loglevel to DEBUG in the client and I don’t see any logs coming out
> either.
> >
> > I intend to write some re-try code and do some alarming based on the
> events returned by library .
> >
> > Any suggestions?
> >
> > Thanks
> > -Parag
>

Re: The YCSB benchmark tool for zookeeper is now available

2020-12-14 Thread Szalay-Bekő Máté

this is a great contribution!
Thank you for both the implementation of the new YCSB binding and also for
documenting it in the ZooKeeper project!

Best regards,
Mate

On Mon, Dec 14, 2020 at 8:58 AM Enrico Olivelli  wrote:

> Great
>
> Thank you very much Justin
>
>
> Enrico
>
> Il Lun 14 Dic 2020, 05:36 Justin Ling Mao  ha
> scritto:
>
> > Now users can benchmark your zookeeper ensemble with YCSB. Here is the
> > ZK-PR(https://github.com/apache/zookeeper/pull/1558). Have fun for it:)
>

Re: zookeeper session issue with 3.5.x version

2020-11-09 Thread Szalay-Bekő Máté

Hello Vik,

This issue reminds me of
https://issues.apache.org/jira/browse/ZOOKEEPER-3940
Can you doublecheck if you see the same issue? I think ZOOKEEPER-3940 is
docker related. Are you using a dockerized ZooKeeper?

If you have a different problem, then I recommend you to file a Jira
ticket, attaching debug logs from all the 3 ZooKeeper server processes.

Kind regards,
Mate

On Sat, Nov 7, 2020 at 9:28 PM vikramark s 
wrote:

> Hi,
>
> I am relatively new to zookeeper and I am struggling to resolve an issue we
> are experiencing. We have recently upgraded our zookeeper version from
> 3.4.x to 3.5.8. We are experiencing some issues which we think are related
> to session sharing among nodes.
>
> I was able to recreate the issue with a sample zookeeper setup. I am not
> able to set up new session after taking down the leader in a 3 node
> cluster. The same flow works with 3.4.14 zookeeper but not with 3.5.8. I am
> hoping maybe there is some setting I am overlooking here as I don't find
> anyone complaining about this online.
>
> Below are the details:
>
> 3 node cluster. After starting all the zoo nodes:
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 2
>
> Sent: 1
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> After starting one session using zkCli.sh on Zoo1 node:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 1/9/23
>
> Received: 7
>
> Sent: 6
>
> Connections: 2
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 4
>
> Sent: 3
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: 36/36/36
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 3
>
> Sent: 2
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
>
>
>
>
> *Note: We can see that Zxid is now consistent across all nodes. *
>
>
>
> I then shut down leader node zoo2. I can see ZOO3 became the Leader. But
> for some reason the ZXID is not the same between zoo1 and zoo3.
>
>
>
> Now closed the existing zkCli and started a new zkCli.sh session on the
> same node (zoo1).  The session was not established, the cli client just
> keeps retrying and created many outstanding requests on zoo1.  The only way
> to resolve now is to shut down all nodes and restart them again.
> (Currently, if the leader node goes down, our kafka cluster stops working.
> )
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/2
>
> Received: 50
>
> Sent: 43
>
> Connections: 2
>
> Outstanding: 6
>
> Zxid: 0x10001
>
> Mode: follower
>
> Node count: 5
>
> down
>
> Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on
> 05/04/2020 15:07 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x2
>
> Mode: leader
>
> Node count: 5
>
> Proposal sizes last/min/max: -1/-1/-1
>
>
>
> *Question: Why is the client not able to establish the session on Zoo1 ? *
>
>
>
>
>
> But a similar flow with zookeeper 3.4.14 works fine. Below is the detail:
>
>
>
> First initial setup:
>
>
>
> Zoo1
>
> Zoo2
>
> Zoo3
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x0
>
> Mode: follower
>
> Node count: 4
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>
> Received: 1
>
> Sent: 0
>
> Connections: 1
>
> Outstanding: 0
>
> Zxid: 0x1
>
> Mode: leader
>
> Node count: 4
>
> Proposal sizes last/min/max: -1/-1/-1
>
> Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built
> on 03/06/2019 16:18 GMT
>
> Latency min/avg/max: 0/0/0
>

Re: upgrade from 3.4.5 to 3.5.6

2020-10-14 Thread Szalay-Bekő Máté

the config looks OK in general...

- are you sure the same configs are used on all ZK servers?
- does the truststores accept all keys on the keystores? (if the
truststores of the old servers had to be modified, then did you restart the
old servers with the updated truststores?)
- did the 3 node ZK cluster work with SSL? (were you able to connect to it
with the client using SSL?)

also: do you really need client authentication with SSL? (I see you are
using SASL too)
If you only need SSL for wire encryption, then you can try
with ssl.clientAuth=none (see the admin guide). Although that feature was
broken on 3.5.6, got fixed on 3.5.7 according to the doc.

best regards,
Mate

On Wed, Oct 14, 2020 at 1:10 PM kuldeep singh 
wrote:

> Sorry,
> secureClientPort=2182
>
> Thanks,
> -
> Kuldeep Singh Budania
>
>
>
> On Wed, Oct 14, 2020 at 4:18 PM kuldeep singh 
> wrote:
>
> > Thanks for reply
> >
> > zoo.cfg
> > ---
> > secureClientPort=2181
> > serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> > initLimit=10
> > syncLimit=5
> > dataDir= data directory (not mentioning exact path here)
> > tickTime=2000
> > autopurge.snapRetainCount=3
> > autopurge.purgeInterval=1
> > admin.enableServer=false
> > standaloneEnabled=false
> > jute.maxbuffer=2147483648
> > server.1=host1_priv:10288:10388
> > server.2=host2_priv:10288:10388
> > server.3=host3_priv:10288:10388
> > server.4=host4_priv:10288:10388
> > server.5=host5_priv:10288:10388
> > quorum.auth.enableSasl=true
> > quorum.auth.learnerRequireSasl=true
> > quorum.auth.serverRequireSasl=true
> > quorum.auth.learner.loginContext=QuorumLearner
> > quorum.auth.server.loginContext=QuorumServer
> > quorum.cnxn.threads.size=10
> > -
> > java.env
> >
> > export
> >
> SERVER_JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> > -Dzookeeper.ssl.keyStore.location=keystore.jks
> > -Dzookeeper.ssl.keyStore.password=
> > -Dzookeeper.ssl.trustStore.location= keystore.jks
> > -Dzookeeper.ssl.trustStore.password= 
> > -Djava.security.auth.login.config=zookeeper-jaas.conf"
> >
> > export
> >
> CLIENT_JVMFLAGS="-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> > -Dzookeeper.client.secure=true -Dzookeeper.ssl.keyStore.location=
> > keystore.jks -Dzookeeper.ssl.keyStore.password= 
> > -Dzookeeper.ssl.trustStore.location=keystore.jks
> > -Dzookeeper.ssl.trustStore.password= 
> > -Dzookeeper.ssl.hostnameVerification=false"
> >
> > Thanks,
> > -
> > Kuldeep Singh Budania
> >
> >
> >
> > On Wed, Oct 14, 2020 at 4:12 PM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com> wrote:
> >
> >> These log messages indicate that a client (or an other ZooKeeper server)
> >> is
> >> trying to connect without SSL to a ZooKeeper process that expects SSL.
> >> I assume this will be a configuration issue then.
> >>
> >> Best regards,
> >> Mate
> >>
> >> On Wed, Oct 14, 2020 at 12:30 PM kuldeep singh <
> kuldeep.sing...@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > more logs
> >> >
> >> > 2020-10-14 12:25:05,106 - ERROR
> >> >
> >> >
> >>
> [nioEventLoopGroup-7-4:NettyServerCnxnFactory$CnxnChannelHandler$CertificateVerifier@257
> >> > ]
> >> > - Unsuccessful handshake with session 0x0
> >> >
> >> > 2020-10-14 12:25:05,107 - WARN
> >> > [nioEventLoopGroup-7-4:NettyServerCnxnFactory$CnxnChannelHandler@138]
> -
> >> > Exception caught
> >> >
> >> > io.netty.handler.codec.DecoderException:
> >> > io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record:
> >> > 737276720a
> >> >
> >> > at
> >> >
> >> >
> >>
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:475)
> >> >
> >> > at
> >> >
> >> >
> >>
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283)
> >> >
> >> > at
> >> >
> >> >
> >>
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractCha

Re: upgrade from 3.4.5 to 3.5.6

2020-10-14 Thread Szalay-Bekő Máté

.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> > at
> >
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
> > at
> >
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
> > at
> >
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
> > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
> > at
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
> > at
> > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> > at
> >
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > Thanks,
> > -
> > Kuldeep Singh Budania
> >
> >
> >
> > On Thu, Oct 8, 2020 at 6:33 PM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com> wrote:
> >
> >> Sounds like a bug or a configuration issue...
> >> can you share the configs (before and after the scale-up) and the logs?
> >> also: does the truststores recognise all the keys used on all the 5
> >> nodes? (e.g. the truststores on the old nodes accept the new keys?)
> >>
> >> Best Regards,
> >> Mate
> >>
> >> On Thu, Oct 8, 2020 at 2:31 PM kuldeep singh  >
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > Yes, My client and server both are using certificate and have added in
> >> ZK
> >> > and client as well.
> >> >
> >> > Thanks,
> >> > -
> >> > Kuldeep Singh Budania
> >> >
> >> >
> >> >
> >> > On Thu, Oct 8, 2020 at 5:56 PM Enrico Olivelli 
> >> wrote:
> >> >
> >> > > Il giorno gio 8 ott 2020 alle ore 14:17 kuldeep singh <
> >> > > kuldeep.sing...@gmail.com> ha scritto:
> >> > >
> >> > > > Hi Team,
> >> > > >
> >> > > > I am facing one issue in SSL communication between client and
> >> zookeeper
> >> > > > server.
> >> > > >
> >> > > > ZK 3.5.6 version
> >> > > >
> >> > > > 1. Mi on 3 node
> >> > > > 2. Applying SSL and 3 nodes cluster is working fine
> >> > > > 3. Scaled my cluster with 2 nodes and now my cluster have 5 nodes
> >> over
> >> > > SSL
> >> > > >
> >> > > > but after scaling my SSL is not working between client and ZK
> >> server and
> >> > > > even not able to login using zkCli as well.
> >> > > >
> >> > > > Can someone provide the details please why it is happening?
> >> > > >
> >> > >
> >> > > Is your client configured to use SSL ?
> >> > >
> >> > > Enrico
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > Thanks,
> >> > > > -
> >> > > > Kuldeep Singh Budania
> >> > > > Software Architect
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, Jul 13, 2020 at 2:19 PM Enrico Olivelli - Diennea
> >> > > >  wrote:
> >> > > >
> >> > > > > It looks like we ported it to 3.5.
> >> > > > >
> >> > > > > See the subtask
> >> > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2792
> >> > > > >
> >> > > > > Enrico
> >> > > > >
> >> > > > > Il giorno 13/07/20, 10:37 "kuldeep singh" <
> >> kuldeep.sing...@gmail.com>
> >> > > > ha
> >> > > > > scritto:
> >> > > > >
> >> > > > > Hi Team,
> >> > > > >
> >> > > > > I a

Re: upgrade from 3.4.5 to 3.5.6

2020-10-08 Thread Szalay-Bekő Máté

Sounds like a bug or a configuration issue...
can you share the configs (before and after the scale-up) and the logs?
also: does the truststores recognise all the keys used on all the 5
nodes? (e.g. the truststores on the old nodes accept the new keys?)

Best Regards,
Mate

On Thu, Oct 8, 2020 at 2:31 PM kuldeep singh  wrote:
>
> Hi,
>
> Yes, My client and server both are using certificate and have added in ZK
> and client as well.
>
> Thanks,
> -
> Kuldeep Singh Budania
>
>
>
> On Thu, Oct 8, 2020 at 5:56 PM Enrico Olivelli  wrote:
>
> > Il giorno gio 8 ott 2020 alle ore 14:17 kuldeep singh <
> > kuldeep.sing...@gmail.com> ha scritto:
> >
> > > Hi Team,
> > >
> > > I am facing one issue in SSL communication between client and zookeeper
> > > server.
> > >
> > > ZK 3.5.6 version
> > >
> > > 1. Mi on 3 node
> > > 2. Applying SSL and 3 nodes cluster is working fine
> > > 3. Scaled my cluster with 2 nodes and now my cluster have 5 nodes over
> > SSL
> > >
> > > but after scaling my SSL is not working between client and ZK server and
> > > even not able to login using zkCli as well.
> > >
> > > Can someone provide the details please why it is happening?
> > >
> >
> > Is your client configured to use SSL ?
> >
> > Enrico
> >
> >
> >
> > >
> > > Thanks,
> > > -
> > > Kuldeep Singh Budania
> > > Software Architect
> > >
> > >
> > >
> > > On Mon, Jul 13, 2020 at 2:19 PM Enrico Olivelli - Diennea
> > >  wrote:
> > >
> > > > It looks like we ported it to 3.5.
> > > >
> > > > See the subtask
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2792
> > > >
> > > > Enrico
> > > >
> > > > Il giorno 13/07/20, 10:37 "kuldeep singh" 
> > > ha
> > > > scritto:
> > > >
> > > > Hi Team,
> > > >
> > > > I appreciate it if I will get a response as soon as possible, as I
> > am
> > > > stuck
> > > > at this point.
> > > >
> > > > Thanks,
> > > > -
> > > > Kuldeep Singh Budania
> > > >
> > > >
> > > >
> > > > On Mon, Jul 13, 2020 at 11:10 AM kuldeep singh <
> > > > kuldeep.sing...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Team,
> > > > >
> > > > > Server to Server communication is not supported in 3.5.6 version
> > > as
> > > > per
> > > > > below JIRA issue?
> > > > >
> > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2639
> > > > >
> > > > > Thanks,
> > > > > -
> > > > > Kuldeep Singh Budania
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jul 2, 2020 at 4:24 PM kuldeep singh <
> > > > kuldeep.sing...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Thanks for the reply.
> > > > >>
> > > > >> Now my ZKCli cmd is working fine as we use some our customized
> > > > >> authentication and we resolve the issue.
> > > > >>
> > > > >> Now I am going to implement Server to Server communication.
> > > > >>
> > > > >> Thanks,
> > > > >> -
> > > > >> Kuldeep Singh Budania
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Jul 2, 2020 at 3:53 PM Szalay-Bekő Máté <
> > > > >> szalay.beko.m...@gmail.com> wrote:
> > > > >>
> > > > >>> I think SSL is working for you already... If you managed to
> > start
> > > > the
> > > > >>> zkCli.sh and be able to connect to ZooKeeper on the secure port
> > > > and issue
> > > > >>> any kind of command (like: " ls / "), then the wire encryption
> > is
> > > > working
> > > > >>> and your server/client communication is secu

Re: ZooKeeper Cluster Health Checking

2020-09-23 Thread Szalay-Bekő Máté

Hi Adrien,

I noticed you are setting "dataLogDir" to /var/log/zookeeper. Please
note that ZooKeeper stores transaction logs in the dataLogDir, what is
real data needed for ZooKeeper recovery. These are not regular
application log text files, what you usually want to put into
/var/log.

Otherwise as far as I can tell, your config seems to be OK. ZooKeeper
should trigger the autopurge job in each 48 hours, keeping only the 3
most recent snapshots (plus some transaction logs from the same time
period). Although this ZooKeeper version (3.4.10) is an old one and
not even supported by the community officially. You should consider
upgrading your zookeeper cluster independently from the autopurge
problems... Also there might be some fixes around autoPurge in more
recent versions.

Also you can maybe try to kick-in the purge job manually (and also
looking for errors in the log). I never did this, but there is an
example command in the documentation:
java -cp 
zookeeper.jar:lib/slf4j-api-1.7.5.jar:lib/slf4j-log4j12-1.7.5.jar:lib/log4j-1.2.17.jar:conf
org.apache.zookeeper.server.PurgeTxnLog   -n 

see: https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html

Best regards,
Mate


On Wed, Sep 23, 2020 at 11:04 AM Enrico Olivelli  wrote:
>
> Adrien
>
> Il giorno mer 23 set 2020 alle ore 10:59 adrien ruffie <
> adriennolar...@hotmail.fr> ha scritto:
>
> > Hello all,
> >
> > I have a problem in production ...
> >
> > We have the following zoo configuration file:
> >
> > tickTime=4000
> > dataDir=/var/lib/zookeeper
> >
> > dataLogDir=/var/log/zookeeper
> >
> > initLimit=30
> > syncLimit=15
> >
> > autopurge.snapRetainCount=3
> > autopurge.purgeInterval=48
> >
> > clientPort=2181
> > maxClientCnxns=60
> >
> > server.1=ZOO1:2888:3888
> > server.2=ZOO2:2888:3888
> > server.3=ZOO3:2888:3888
> > server.4=ZOO4:2888:3888
> > server.5=ZOO5:2888:3888
> >
> > We are in zookeeper-3.4.10, but we recently saw, that log and snapshot
> > aren't purge ...
> > do you know this issue, is a bug, or bad configuration ?
> >
>
> Do you see errors in logs ?
>
> Are you using standard Apache distributions?
>
> Enrico
>
>
> >
> > Thank you very much and best regards
> >
> > Adrien Ruffié
> > 
> > De : adrien ruffie 
> > Envoyé : mercredi 18 juillet 2018 09:01
> > À : user@zookeeper.apache.org 
> > Objet : RE: ZooKeeper Cluster Health Checking
> >
> > Ok thank Harish,
> >
> > I keep the idea !
> >
> >
> > Best regards,
> >
> >
> > Adrien
> >
> > 
> > De : harish lohar 
> > Envoyé : mardi 17 juillet 2018 23:13:28
> > À : user@zookeeper.apache.org
> > Objet : Re: ZooKeeper Cluster Health Checking
> >
> > We did it via java monitoring app , using zookeeper java api which sends 4
> > lw commands to zookeeper and returns the output.
> >
> >
> > Thanks
> > Harish
> >
> > On Tue, Jul 17, 2018 at 2:00 AM adrien ruffie 
> > wrote:
> >
> > > Hi Harish,
> > >
> > >
> > > thank you very much for this advise and explanation !
> > >
> > > Do you think with just a simple script shell for checking all this
> > metrics
> > > is enough ? Or would better to do it in a Java with a simple monitoring
> > > application?
> > >
> > >
> > > Thank again,
> > >
> > >
> > > Best regards,
> > >
> > >
> > > Adrien
> > >
> > > 
> > > De : harish lohar 
> > > Envoyé : mardi 17 juillet 2018 04:13:51
> > > À : user@zookeeper.apache.org
> > > Objet : Re: ZooKeeper Cluster Health Checking
> > >
> > > Hi Adrian,
> > > Below zookeeper commands are generally used to get health of zookeeper
> > > cluster
> > > stat
> > >
> > > Lists brief details for the server and connected clients.
> > >
> > > usage echo stat | nc server port
> > >
> > > This gives whether cluster is up /down. If down this will give that
> > >
> > > Zookeeper instance is currently not serving any request -  which means
> > > either the leader election is failing or <= 50% of zookeeper node in
> > > cluster are down.
> > >
> > >
> > > mntr
> > >
> > > *New in 3.4.0:* Outputs a list of variables that could be used for
> > > monitoring the health of the cluster.
> > >
> > > $ echo mntr | nc localhost 2185
> > >
> > > zk_version  3.4.0
> > > zk_avg_latency  0
> > > zk_max_latency  0
> > > zk_min_latency  0
> > > zk_packets_received 70
> > > zk_packets_sent 69
> > > zk_outstanding_requests 0
> > > zk_server_state leader
> > > zk_znode_count   4
> > > zk_watch_count  0
> > > zk_ephemerals_count 0
> > > zk_approximate_data_size27
> > > zk_followers4   - only exposed by the Leader
> > > zk_synced_followers 4   - only exposed by the Leader
> > > zk_pending_syncs0   - only exposed by the Leader
> > > zk_open_file_descriptor_count 23- only available on Unix platforms
> > > zk_max_file_descriptor_count 1024   - only available on Unix platforms
> > >
> > > The output is compatible with java properties format and the content may
> > > change over time (new keys added). Your

Re: OK to use 3.6.1 (or 3.5.8) client against a 3.4.6 server?

2020-08-31 Thread Szalay-Bekő Máté

Hello!

I think if you don't use any new ZooKeeper feature, then it is safe to use
the 3.5.8 or 3.6.1 clients with the old (3.4.6) server. But as far as I
know, the other approach should work too: using 3.4.6 clients with 3.5.8
or 3.6.1 servers. I don't really know which way would be better for
ZooKeeper.

When we were planning our upgrade in 2019, we did some compatibility tests
with some small sample applications on a matrix of different server-client
combinations to actually make sure we won't have any problems, and found
back there no incompatibilities. (this tests were not executed yet on 3.5.8
and 3.6.1, but I expect the same results on the more recent versions)

Before your upgrade, make sure you check:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

Kind regards,
Mate


On Fri, Aug 28, 2020 at 10:31 PM  wrote:

> It should work either way but I would suggest to upgrade zookeeper server
> version 1st .
>
> On 8/28/20, 1:29 PM, "John Lindwall"  wrote:
>
> [External]
>
>
> We are considering upgrading our zookeeper servers from 3.4.6 to
> either 3.5.8 or 3.6.1.  I wonder if will work if we proactively upgrade our
> client versions from 3.4.6 to the new, more current version (3.5.8 or
> 3.6.1) and run that newer client code against the older 3.4.6 servers?
> Later when we update the servers, then the client version and the server
> versions will match.
> Obviously we will not make use of any new API methods added to
> zookeeper since 3.4.6; in fact we will simply be recompiling our existing
> client code against the new libs (which works as I just tested it).
> Thanks!
> -- John
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

Re: WARN [NIOWorkerThread-2:NIOServerCnxn@373] - Close of session 0x0

2020-08-25 Thread Szalay-Bekő Máté

there should be some process / script that is sending malformed packages to
the ZooKeeper client port.

The first 4 bytes sent by the client is interpreted as the length of the
message (or as a 4 letter word command), but ZooKeeper is unable to parse
this. (most probably it is not a real ZooKeeper message, unless the client
really wants to send ~370MB in a single message) Sometimes what happens is
that some security scanners are periodically checking the ZooKeeper client
port. Or maybe some misconfigured client is trying to connect using TLS/SSL
to an unsecure ZooKeeper port, that doesn't expect an SSL handshake?

Best Regards,
Mate

On Tue, Aug 25, 2020 at 8:25 AM Dhirendra Singh 
wrote:

> I have setup zookeeper ensemble with quorum size of 3 in kubernetes.
> zookeeper version i am using is 3.6.1
> i see following warning message logged in the log approximately every 4
> seconds in the log of all zookeeper server. what is the cause of these
> warning messages ?
>
> 2020-08-25 05:17:40,265 [myid:1] - WARN
> [NIOWorkerThread-2:NIOServerCnxn@373] - Close of session 0x0
> java.io.IOException: Len error 369295618
> at
>
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:541)
> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:332)
> at
>
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> at
>
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
>
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
>
> -dsingh
>

Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-28 Thread Szalay-Bekő Máté

> It seems Zookeeper is rebinding the client port to the announced IP
during the startup sequence.

this is strange... According to the documentation (
https://zookeeper.apache.org/doc/r3.6.1/zookeeperReconfig.html):

A client port of a server is the port on which the server accepts client
connection requests. Starting with 3.5.0 the clientPort and
clientPortAddress configuration parameters should no longer be used.
Instead, this information is now part of the server keyword specification,
which becomes as follows:
server. = ::[:role];[:]

So I would expect, this should work:
ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;0.0.0.0:2181 \
server.2=x.x.x.2:2888:3888:participant;0.0.0.0:2181 \
server.3=x.x.x.3:2888:3888:participant;0.0.0.0:2181

Although the  should default to 0.0.0.0 anyway.

But based on your logs I think you are right. A reconfig of the
clientAddress is happening in the code here:
https://github.com/apache/zookeeper/blob/1c41e127537f66842515ccb21fb48f1670003454/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L2194

One thing you can try is to switch to Netty instead of NIO, as the Netty
reconfig code contains some extra 0.0.0.0 related checks.
You can do that e.g. by providing the following environment variable for
docker:
JVMFLAGS="-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory"

Also AFAICS the clientAddress is reconfigured only during the processing of
dynamic reconfig. Are you using the dynamic reconfig feature actually? If
not, then disabling it can fix the issue.
Dynamic reconfig should be disabled by default in ZooKeeper, although I'm
not sure about the docker image config.
Maybe trying this? ZOO_CFG_EXTRA="quorumListenOnAllIPs=true
reconfigEnabled=false"

If these don't help, then can you share debug logs from one of your
containers?

Kind regards,
Mate

On Fri, Jul 24, 2020 at 6:15 PM Thilo-Alexander Ginkel 
wrote:

> On Mon, Jul 20, 2020 at 2:29 PM Szalay-Bekő Máté
>  wrote:
> > Can you try to change your configs by not using 0.0.0.0 in the
> ZOO_SERVERS?
> > Using 0.0.0.0 is not a recommended config since 3.5. If the java process
> > can not bind (due to some virtual network issue) to the host provided in
> > it's config, then you can use the quorumListenOnAllIPs parameter.
> >
> > So you should have the very same configuration for all nodes in your
> > cluster, like:
> >
> > ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
> > ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
> >   server.2=x.x.x.2:2888:3888:participant;2181 \
> >   server.3=x.x.x.3:2888:3888:participant;2181
>
> That works, except that I cannot get the client port (2181) to listen
> on 0.0.0.0 so it can be mapped to the outside. Any idea how to achieve
> that?
>
> -- 8< --
> 2020-07-24 16:04:59,243 [myid:] - INFO  [main:QuorumPeerConfig@456] -
> clientPortAddress is 0.0.0.0:2181
> 2020-07-24 16:04:59,415 [myid:2] - INFO
> [main:NIOServerCnxnFactory@674] - binding to port /0.0.0.0:2181
> 2020-07-24 16:04:59,483 [myid:2] - INFO
>
> [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371
> ]
> - LOOKING
> 2020-07-24 16:04:59,483 [myid:2] - INFO
>
> [QuorumPeer[myid=2](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@944
> ]
> - New election. My id = 2, proposed zxid=0xb
> 2020-07-24 16:04:59,531 [myid:2] - INFO
> [NIOServerCxnFactory.AcceptThread:/0.0.0.0:2181
> :NIOServerCnxnFactory$AcceptThread@209]
> - accept thread exitted run method
> 2020-07-24 16:04:59,531 [myid:2] - INFO
> [WorkerReceiver[myid=2]:NIOServerCnxnFactory@707] - binding to port
> /10.147.254.2:2181
> 2020-07-24 16:04:59,531 [myid:2] - ERROR
> [WorkerReceiver[myid=2]:NIOServerCnxnFactory@713] - Error
> reconfiguring client port to /10.147.254.2:2181
> -- 8< --
>
> It seems Zookeeper is rebinding the client port to the announced IP
> during the startup sequence.
>
> I also tried specifying the bind address in ZOO_SERVER as well as
> through clientPortAddress=0.0.0.0 - without any luck.
>
> > BTW, unfortunately there is no such thing as "official zookeeper Docker
> > image", at least it is not maintained by the Apache ZooKeeper community.
> (I
> > don't know who is maintaining the image on dockerHub
> > https://hub.docker.com/_/zookeeper - it would be nice to ask them to
> update
> > their examples / documentation)
>
> I'll open a PR once I get this sorted out. ;-)
>
> Thanks,
> Thilo
>

Re: Strange zoo.cfg.dynamic.next generated via zookeeper Docker image

2020-07-20 Thread Szalay-Bekő Máté

Hello,

Can you try to change your configs by not using 0.0.0.0 in the ZOO_SERVERS?
Using 0.0.0.0 is not a recommended config since 3.5. If the java process
can not bind (due to some virtual network issue) to the host provided in
it's config, then you can use the quorumListenOnAllIPs parameter.

So you should have the very same configuration for all nodes in your
cluster, like:

ZOO_CFG_EXTRA="quorumListenOnAllIPs=true"
ZOO_SERVERS=server.1=x.x.x.1:2888:3888:participant;2181 \
  server.2=x.x.x.2:2888:3888:participant;2181 \
  server.3=x.x.x.3:2888:3888:participant;2181

This should have the effect that every server is binding on 0.0.0.0
locally, yet still having a consistent view of the server hostnames.

BTW, unfortunately there is no such thing as "official zookeeper Docker
image", at least it is not maintained by the Apache ZooKeeper community. (I
don't know who is maintaining the image on dockerHub
https://hub.docker.com/_/zookeeper - it would be nice to ask them to update
their examples / documentation)

Kind regards,
Mate

On Thu, Jul 16, 2020 at 9:27 AM Thilo-Alexander Ginkel 
wrote:

> Hello again,
>
> just figured out that my rolling restart problems may be caused by
> ZOOKEEPER-3829 (c.f. https://github.com/apache/zookeeper/pull/1356),
> so I tried to set reconfigEnabled=true as a workaround, but that fails
> as Zookeeper attempts to bind to x.x.x.1 instead of 0.0.0.0 (config
> still lists 0.0.0.0 for the local node, respectively) during startup
> in that case, so that's apparently not feasible in a Docker
> environment:
>
> 2020-07-16 07:22:20,141 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1093
> ]
> - Exception while listening
> java.net.BindException: Cannot assign requested address (Bind failed)
> at java.base/java.net.PlainSocketImpl.socketBind(Native Method)
> at java.base/java.net.AbstractPlainSocketImpl.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at java.base/java.net.ServerSocket.bind(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.createNewServerSocket(QuorumCnxManager.java:1134)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.acceptConnections(QuorumCnxManager.java:1064)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener$ListenerHandler.run(QuorumCnxManager.java:1033)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> 2020-07-16 07:22:21,143 [myid:1] - ERROR
>
> [ListenerHandler-/x.x.x.1:3888:QuorumCnxManager$Listener$ListenerHandler@1112
> ]
> - Leaving listener thread for address 10.147.254.1:3888 after 3
> errors. Use zookeeper.electionPortBindRetry property to increase retry
> count.
>
> Are there any plans to release 3.6.2 including the above fix?
>
> Regards,
> Thilo
>

Re: Upgrading existing non-TLS cluster with no downtime

2020-07-20 Thread Szalay-Bekő Máté

echo "stat" | nc localhost 2182Hi,

I guess this is the part you are referring:
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#Upgrading+existing+nonTLS+cluster
(your link was pointing to the 3.3.2 admin guide where this chapter was
missing)

> 1) When I set sslQuorum=true  and portUnification=true on the first
server,
does it go out of the quorum? And when these properties are set in the
second server, a new quorum of first and second server is formed and now
the third server is out of quorum. When the 3rd server follows suit, it is
added back to the quorum.

the "sslQuorum=true  and portUnification=true" setting is needed in step 4
(although the numbering is bad in the markdown...). After step 3 you
already have a 3 server quorum up with portUnification=true, meaning the
cluster can handle both TLS/SSL and regular/non-secure connections. So when
you restart server 1 with sslQuorum=true, then it will be able to re-join
to the quorum, as server 2 and 3 are capable of handling SSL connections
(even if they are not using it for connection initiation). So ideally
between restarting each servers with sslQuorum=true, you always should have
a 3 node full quorum.

> 2) The guideline says to check after restarting every broker that the
quorum is healthy, is there any metric to track that?

I send the "stat" command to all nodes to see if everyone is connected to
the quorum. E.g.: echo "stat" | nc localhost 2181
I usually use 4-letter-word commands but the REST admin API works as well,
and actually that is the officially recommended way, as the 4-letter-words
are / will be deprecated some time.
For the admin server see:
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_adminserver

Kind regards,
Mate

On Tue, Jul 14, 2020 at 10:52 PM Sankalp Bhatia 
wrote:

> +users
>
> On Tue, 14 Jul 2020 at 21:51, Sankalp Bhatia 
> wrote:
>
> > Hi All,
> >
> > I am trying to follow the section "Upgrading existing non-TLS cluster
> with
> > no downtime" in the zookeeper guide :
> > https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html
> >
> > I have an ensemble of 3 servers. I have a couple of questions:
> >
> > 1) When I set sslQuorum=true  and portUnification=true on the first
> > server, does it go out of the quorum? And when these properties are set
> > in the second server, a new quorum of first and second server is formed
> and
> > now the third server is out of quorum. When the 3rd server follows suit,
> it
> > is added back to the quorum.
> >
> > If this is the case, what is the use of a the port-unification feature
> > here?
> >
> > 2) The guideline says to check after restarting every broker that the
> > quorum is healthy, is there any metric to track that?
> >
> > Thanks,
> > Sankalp
> >
> >
> >
> >
>

Re: Zookeeper session expiration

2020-07-20 Thread Szalay-Bekő Máté

Hello,

can you reproduce the problem with the latest 3.5 version? I mean 3.5.8.
There were a few bugfixes recently that can help. e.g.:
https://issues.apache.org/jira/browse/ZOOKEEPER-3756
Also you can try to increase some timeout parameters, see
https://zookeeper.apache.org/doc/r3.5.8/zookeeperAdmin.html#sc_configuration
(like minSessionTimeout, maxSessionTimeout, syncLimit)

Kind regards,
Mate

On Mon, Jul 13, 2020 at 5:19 PM Srikant Kalani 
wrote:

> I am facing a similar issue in my application.
>
> Zookeeper Server Version 3.5.5
>
> I implemented SSL ( server to server ) in quorum communication.
>
> After that ZK client frequently receives session timeouts.
>
> When I turned off SSL then application is behaving normally and there are
> no
> timeouts.
>
> Any thoughts ?
>
> Thanks
> Srikant Kalani
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: upgrade from 3.4.5 to 3.5.6

2020-07-02 Thread Szalay-Bekő Máté

I think SSL is working for you already... If you managed to start the
zkCli.sh and be able to connect to ZooKeeper on the secure port and issue
any kind of command (like: " ls / "), then the wire encryption is working
and your server/client communication is secured by ZooKeeper.

Why you want to run the following command?
addauth ztpasswd zooadmin:

Do you also want to configure a superDigest user in ZooKeeper? Please note
that this command is independent from SSL. If you need to create a
username-password pair for digest authentication then please use the
command in the following way:
addauth digest zooadmin:yourSuperSecretPassword

Kind regards,
Mate

On Thu, Jul 2, 2020 at 6:59 AM kuldeep singh 
wrote:

> 1. sh zkCli.sh --config /etc/zookeeper -server localhost:2281
>
> 2. addauth ztpasswd zooadmin:
>
>
> Thanks,
> -
> Kuldeep Singh Budania
>
>
>
> On Thu, Jul 2, 2020 at 9:56 AM kuldeep singh 
> wrote:
>
> > Hi Team,
> >
> > Any update on this?
> >
> > Thanks,
> > -
> > Kuldeep Singh Budania
> >
> >
> >
> > On Wed, Jul 1, 2020 at 6:43 PM kuldeep singh 
> > wrote:
> >
> >> Sorry this is my bad, there were server setting like below
> >>
> >> export SERVER_JVMFLAGS="
> >>
> >>
> >>
> -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> >>
> >> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
> >> -Dzookeeper.ssl.keyStore.password=testpass
> >>
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
> >> -Dzookeeper.ssl.trustStore.password=testpass"
> >>
> >>
> >>
> >> export CLIENT_JVMFLAGS="
> >>
> >> -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> >> -Dzookeeper.client.secure=true
> >> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
> >> -Dzookeeper.ssl.keyStore.password=testpass
> >>
> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
> >> -Dzookeeper.ssl.trustStore.password=testpass"
> >>
> >> I want to have SSL  between client to server communication
> >>
> >> I am already following the same link which you have shared with me but
> >> that is not working.
> >>
> >> Zoo.cfg
> >>
> >> secureClientPort=2281
> >> initLimit=10
> >> syncLimit=5
> >> dataDir=/var/lib/zookeeper/data
> >> tickTime=2000
> >> autopurge.snapRetainCount=3
> >> autopurge.purgeInterval=1
> >> admin.enableServer=false
> >> standaloneEnabled=false
> >> jute.maxbuffer=2147483648
> >> serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> >> server.1=host1_priv:10288:10388
> >> server.2=host2_priv:10288:10388
> >> server.3=host3_priv:10288:10388
> >>
> >>
> >> command to connect using zkcli
> >>
> >> 1. zkcli zoo.cfg localhost:2281
> >> 2. addauth ztpasswd usernaem:password
> >>
> >> after second step we are getting below error
> >>
> >> WatchedEvent state:AuthFailed type:None path:null
> >>
> >>
> >> Zookeeper logs :- *2020-07-01 07:38:09,342 - WARN
> >> [nioEventLoopGroup-4-2:ZooKeeperServer@1119] - No authentication
> provider
> >> for scheme: ztpasswd has x509 ip digest*
> >>
> >> Thanks,
> >> -
> >> Kuldeep Singh Budania
> >>
> >>
> >>
> >> On Wed, Jul 1, 2020 at 6:25 PM Szalay-Bekő Máté <
> >> szalay.beko.m...@gmail.com> wrote:
> >>
> >>> >  No authentication provider for scheme: ztpasswd has x509 ip digest*
> >>>
> >>> This suggest you have some configuration error... Where did you use the
> >>> "ztpasswd" string in your configs / commands?
> >>>
> >>> On Wed, Jul 1, 2020 at 2:53 PM Szalay-Bekő Máté <
> >>> szalay.beko.m...@gmail.com>
> >>> wrote:
> >>>
> >>> > > My ZK server  is up and running in secure mode
> >>> >
> >>> > What is your goal? You want to setup client-server SSL connection?
> >>> >
> >>> > see:
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
> >>> >
> >>> > (or you want to have both SSL and SASL enabled

Re: upgrade from 3.4.5 to 3.5.6

2020-07-01 Thread Szalay-Bekő Máté

>  No authentication provider for scheme: ztpasswd has x509 ip digest*

This suggest you have some configuration error... Where did you use the
"ztpasswd" string in your configs / commands?

On Wed, Jul 1, 2020 at 2:53 PM Szalay-Bekő Máté 
wrote:

> > My ZK server  is up and running in secure mode
>
> What is your goal? You want to setup client-server SSL connection?
>
> see:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
>
> (or you want to have both SSL and SASL enabled?)
>
> Anyway, please remove the following line from the SERVER_JVMFLAGS:
> -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> This is a configuration that makes sense only for the ZooKeeper client,
> not for the server. For the server, use the following:
>
> -Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
>
> If it doesn't solve the issue, then can you please send your zoo.cfg file?
> Also can you please send the zkCli command you execute? (you need to
> connect to the secure ZooKeeper port, unless portUnification is enabled)
>
> Kind regards,
> Mate
>
> On Wed, Jul 1, 2020 at 9:48 AM kuldeep singh 
> wrote:
>
>> Hi,
>>
>> we have done below changes in java.env file
>>
>> export SERVER_JVMFLAGS="
>>
>> -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
>> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
>> -Dzookeeper.ssl.keyStore.password=testpass
>> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
>> -Dzookeeper.ssl.trustStore.password=testpass"
>>
>>
>>
>> export CLIENT_JVMFLAGS="
>>
>> -Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
>> -Dzookeeper.client.secure=true
>> -Dzookeeper.ssl.keyStore.location=/root/zookeeper/ssl/testKeyStore.jks
>> -Dzookeeper.ssl.keyStore.password=testpass
>> -Dzookeeper.ssl.trustStore.location=/root/zookeeper/ssl/testTrustStore.jks
>> -Dzookeeper.ssl.trustStore.password=testpass"
>>
>> I have started the ZK server and it is up without any issue.
>>
>> But now when I login to ZkCli then it gives the below error.
>>
>> WatchedEvent state:AuthFailed type:None path:null
>>
>>
>> Zookeeper logs :- *2020-07-01 07:38:09,342 - WARN
>> [nioEventLoopGroup-4-2:ZooKeeperServer@1119] - No authentication provider
>> for scheme: ztpasswd has x509 ip digest*
>>
>> Please help me on this issue
>>
>> Thanks,
>> -
>> Kuldeep Singh Budania
>> Software Architect
>>
>>
>> On Wed, Jul 1, 2020 at 12:05 PM kuldeep singh 
>> wrote:
>>
>> > Hi,
>> >
>> > My ZK server  is up and running in secure mode, But When I am trying to
>> > connect to the ZK server using ZKCli, it gives the below error.
>> >
>> > WatchedEvent state:AuthFailed type:None path:null
>> >
>> >
>> > Zookeeper logs :- *2020-07-01 07:38:09,342 - WARN
>> > [nioEventLoopGroup-4-2:ZooKeeperServer@1119] - No authentication
>> provider
>> > for scheme: ztpasswd has x509 ip digest*
>> >
>> > Can someone please help me on this issue. we are using the 3.5.6
>> version.
>> >
>> > I appreciate if I will get a response as soon as possible, as I am stuck
>> > at this point.
>> >
>> > Thanks,
>> > -
>> > Kuldeep Singh Budania
>> > Software Architect
>> >
>> >
>> >
>> > On Thu, Jun 25, 2020 at 11:54 AM Enrico Olivelli - Diennea
>> >  wrote:
>> >
>> >> I mean in zoo.cfg
>> >> Not as a system property
>> >>
>> >> Enrico
>> >>
>> >> Il giorno 25/06/20, 08:19 "Enrico Olivelli - Diennea" <
>> >> enrico.olive...@diennea.com.INVALID> ha scritto:
>> >>
>> >> Hi
>> >> You have to enable Netty on the server side
>> >>
>> >> Something like:
>> >>
>>  serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
>> >>
>> >> Hope that helps
>> >> Enrico
>> >>
>> >> Il giorno 24/06/20, 19:17 "kuldeep singh" <
>> kuldeep.sing...@gmail.com>
>> >> ha scritto:
>> >>
>> >> Hi,
>> >>
>> >> I got below error while setting SSL properties in zkEnv.sh
>> >>
>> >>

Re: upgrade from 3.4.5 to 3.5.6

2020-07-01 Thread Szalay-Bekő Máté

gt; jdk.tls.rejectClientInitiatedRenegotiation=true to disable
> >> client-initiated
> >> TLS renegotiation
> >>
> >> 2020-06-24 15:49:35,897 - INFO  [main:DatadirCleanupManager@78]
> -
> >> autopurge.snapRetainCount set to 3
> >>
> >> 2020-06-24 15:49:35,897 - INFO  [main:DatadirCleanupManager@79]
> -
> >> autopurge.purgeInterval set to 1
> >>
> >> 2020-06-24 15:49:35,898 - INFO  [
> >> PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task
> >> started.
> >>
> >> 2020-06-24 15:49:35,899 - INFO  [main:ManagedUtil@46] - Log4j
> >> found with
> >> jmx enabled.
> >>
> >> 2020-06-24 15:49:35,903 - INFO  [PurgeTask:FileTxnSnapLog@103]
> -
> >> zookeeper.snapshot.trust.empty : false
> >>
> >> 2020-06-24 15:49:35,910 - INFO  [
> >> PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task
> >> completed.
> >>
> >> 2020-06-24 15:49:35,975 - INFO  [main:QuorumPeerMain@141] -
> >> Starting quorum
> >> peer
> >>
> >> 2020-06-24 15:49:35,983 - INFO  [main:ServerCnxnFactory@135] -
> >> Using
> >> org.apache.zookeeper.server.NIOServerCnxnFactory as server
> >> connection
> >> factory
> >>
> >> 2020-06-24 15:49:35,986 - INFO  [main:NIOServerCnxnFactory@673]
> -
> >> Configuring NIO connection handler with 10s sessionless
> >> connection timeout,
> >> 2 selector thread(s), 16 worker threads, and 64 kB direct
> buffers.
> >>
> >> 2020-06-24 15:49:35,992 - INFO  [main:NIOServerCnxnFactory@686]
> >> - binding
> >> to port 0.0.0.0/0.0.0.0:10181
> >>
> >> 2020-06-24 15:49:35,994 - INFO  [main:ServerCnxnFactory@135] -
> >> Using
> >> org.apache.zookeeper.server.NIOServerCnxnFactory as server
> >> connection
> >> factory
> >>
> >> 2020-06-24 15:49:35,995 - ERROR [main:QuorumPeerMain@101] -
> >> Unexpected
> >> exception, exiting abnormally
> >>
> >> java.lang.UnsupportedOperationException: SSL isn't supported in
> >> NIOServerCnxn
> >>
> >> at
> >>
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:644)
> >>
> >> at
> >>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:155)
> >>
> >> at
> >>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
> >>
> >> at
> >>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> >>
> >>
> >>
> >>
> >>
> >> I have set the following properties in SERVER_JVMFLAGS in
> >> zkEnv.sh file  :
> >>
> >>
> "-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> >>
> >>
> -Dzookeeper.ssl.keyStore.location=/var/opt/vs/SecureInterface/keystore/CassSpkkeystore.p12
> >> -Dzookeeper.ssl.keyStore.password=EvaiKiO1@123456
> >>
> >>
> -Dzookeeper.ssl.trustStore.location=/var/opt/vs/SecureInterface/keystore/CassSpkTrustStore.jks
> >> -Dzookeeper.ssl.trustStore.password=EvaiKiO1@123456"
> >>
> >> Thanks,
> >> -
> >> Kuldeep Singh Budania
> >>
> >>
> >>
> >> On Mon, Jun 22, 2020 at 8:08 PM Jordan Zimmerman <
> >> jor...@jordanzimmerman.com>
> >> wrote:
> >>
> >> > It's the same as the normal ZooKeeper client:
> >> >
> >>
> https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_authOptions
> >> > <
> >> >
> >>
> https://zookeeper.apache.org/doc/r3.6.1/zookeeperAdmin.html#sc_authOptions
> >> > >
> >> >
> >> > -Jordan
> >> >
> >> > > On Jun 22, 2020, at 5:50 AM, kuldeep singh <
> >> kuldeep.sing...@gmail.com>
> >> > wrote:
> >> > >
> >> > > Hi Team,
> >> &

Re: How to deliberately cause a split brain?

2020-06-18 Thread Szalay-Bekő Máté

Even with using different server configs, no vote from "unknown" server
should be accepted by the others. (at least if dynamic reconfig is
disabled)

However, I never really tested with intentionally bad configs. There might
exist some bugs we are unaware of and haven't fixed yet. There was an
important bugfix recently which hasn't been released yet:
https://github.com/apache/zookeeper/pull/1356 - it is always possible that
we still have holes somewhere. But still, I think any such scenario that
leads to a split-brain should be considered as a bug.

On Thu, Jun 18, 2020 at 10:43 AM Szalay-Bekő Máté <
szalay.beko.m...@gmail.com> wrote:

> > You can just use iptables and simulate that some network paths are not
> working properly
> I would assume ZooKeeper would handle this, and if no partition has
> quorum, then no writes will be enabled / no leader will be present.
> (readonly mode can still work, if configured properly)
>
> Honestly, I can not really think of any way to get split-brain. If someone
> can, then let us know so that we can fix it ;)
>
> Cheers,
> Mate
>
> On Thu, Jun 18, 2020 at 10:34 AM Enrico Olivelli 
> wrote:
>
>> Tim
>> You can just use iptables and simulate that some network paths are not
>> working properly
>>
>> Enrico
>>
>> Il Gio 18 Giu 2020, 10:26 Tim Ward  ha
>> scritto:
>>
>> > I have been tasked with writing some monitoring code to detect a
>> Zookeeper
>> > split brain using the monitoring system.
>> >
>> >
>> >
>> > All well and good, I think I can see how to do that, but what about
>> > testing? How can I deliberately provoke a Zookeeper ensemble into going
>> > into a split brain state so that I can test the detection code?
>> >
>> >
>> >
>> > Scenarios might be along the lines of: the ops people want to increase
>> an
>> > ensemble size, but get something wrong in the (manual) writing of the
>> > configuration files or in the (manual) restarting of instances. What
>> > "something" is likely to work here?
>> >
>> >
>> >
>> >
>> >
>> > *Tim Ward*
>> >
>> >
>> >
>> >
>> >
>> > Principal engineer
>> >
>> >
>> >
>> >
>> >
>> > —
>> >
>> >
>> >
>> >
>> >
>> > *t: *+44 (0) 1223 345 940
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > *Broers Building, Hauser Forum*
>> >
>> > *21 JJ Thomson Avenue, Cambridge*
>> >
>> > *CB3 0FA United Kingdom*
>> >
>> >
>> >
>> >
>> >
>> > featurespace.com <https://www.featurespace.com/>* | *Twitter
>> > <https://twitter.com/FeaturespaceLtd>* | *LinkedIn
>> > <https://www.linkedin.com/company/featurespace>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > [image: A screenshot of a cell phone Description automatically
>> generated]
>> >
>> >
>> >
>> >
>> > This message, and any files/attachments transmitted together with it, is
>> > intended for the use only of the person (or persons) to whom it is
>> > addressed. It may contain information which is confidential and/or
>> > protected by legal privilege. Accordingly, any dissemination,
>> distribution,
>> > copying or use of this message, or any part of it or anything sent
>> together
>> > with it, other than by intended recipients, may constitute a breach of
>> > civil or criminal law and is hereby prohibited. Unless otherwise stated,
>> > any views expressed in this message are those of the person sending it
>> and
>> > not the sender's employer. No responsibility, legal or otherwise, of
>> > whatever nature, is accepted as to the accuracy of the contents of this
>> > message or for the completeness of the message as received. Anyone who
>> is
>> > not the intended recipient of this message is advised to make no use of
>> it
>> > and is requested to contact Featurespace Limited as soon as possible.
>> Any
>> > recipient of this message who has knowledge or suspects that it may have
>> > been the subject of unauthorised interception or alteration is also
>> > requested to contact Featurespace Limited.
>> >
>>
>

Re: How to deliberately cause a split brain?

2020-06-18 Thread Szalay-Bekő Máté

> You can just use iptables and simulate that some network paths are not
working properly
I would assume ZooKeeper would handle this, and if no partition has quorum,
then no writes will be enabled / no leader will be present. (readonly mode
can still work, if configured properly)

Honestly, I can not really think of any way to get split-brain. If someone
can, then let us know so that we can fix it ;)

Cheers,
Mate

On Thu, Jun 18, 2020 at 10:34 AM Enrico Olivelli 
wrote:

> Tim
> You can just use iptables and simulate that some network paths are not
> working properly
>
> Enrico
>
> Il Gio 18 Giu 2020, 10:26 Tim Ward  ha
> scritto:
>
> > I have been tasked with writing some monitoring code to detect a
> Zookeeper
> > split brain using the monitoring system.
> >
> >
> >
> > All well and good, I think I can see how to do that, but what about
> > testing? How can I deliberately provoke a Zookeeper ensemble into going
> > into a split brain state so that I can test the detection code?
> >
> >
> >
> > Scenarios might be along the lines of: the ops people want to increase an
> > ensemble size, but get something wrong in the (manual) writing of the
> > configuration files or in the (manual) restarting of instances. What
> > "something" is likely to work here?
> >
> >
> >
> >
> >
> > *Tim Ward*
> >
> >
> >
> >
> >
> > Principal engineer
> >
> >
> >
> >
> >
> > —
> >
> >
> >
> >
> >
> > *t: *+44 (0) 1223 345 940
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *Broers Building, Hauser Forum*
> >
> > *21 JJ Thomson Avenue, Cambridge*
> >
> > *CB3 0FA United Kingdom*
> >
> >
> >
> >
> >
> > featurespace.com * | *Twitter
> > * | *LinkedIn
> > 
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [image: A screenshot of a cell phone Description automatically generated]
> >
> >
> >
> >
> > This message, and any files/attachments transmitted together with it, is
> > intended for the use only of the person (or persons) to whom it is
> > addressed. It may contain information which is confidential and/or
> > protected by legal privilege. Accordingly, any dissemination,
> distribution,
> > copying or use of this message, or any part of it or anything sent
> together
> > with it, other than by intended recipients, may constitute a breach of
> > civil or criminal law and is hereby prohibited. Unless otherwise stated,
> > any views expressed in this message are those of the person sending it
> and
> > not the sender's employer. No responsibility, legal or otherwise, of
> > whatever nature, is accepted as to the accuracy of the contents of this
> > message or for the completeness of the message as received. Anyone who is
> > not the intended recipient of this message is advised to make no use of
> it
> > and is requested to contact Featurespace Limited as soon as possible. Any
> > recipient of this message who has knowledge or suspects that it may have
> > been the subject of unauthorised interception or alteration is also
> > requested to contact Featurespace Limited.
> >
>

Re: Defining IPv6 ACL in ZooKeeper 3.6.1

2020-06-18 Thread Szalay-Bekő Máté

Hi Arvanitis,

I don't think it is supported. I haven't found any test for it (only IPv4
tests in the ACLTest.java), and I think the IPv6 support is still missing
in IPAuthenticationProvider:
see
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/auth/IPAuthenticationProvider.java

If it is really missing then I think it would be a nice feature to add, if
you or someone in the community has the time. (I haven't check if there is
any existing Jira for it)

Cheers,
Mate

On Tue, Jun 16, 2020 at 3:14 PM Arvanitis Christos <
christos.arvani...@cern.ch> wrote:

> Is there any way to define an ACL with `ip` schema for an IPv6 address
> in ZooKeeper 3.6.1?
>
> If yes, which notation should be used?
>
> Thank you in advance,
>
> Arvanitis Christos
>
>

Re: Side affects of setting quorumListenOnAllIPs to true

2020-06-16 Thread Szalay-Bekő Máté

:)

just some info from https://zookeeper.apache.org/security.html

" If you have any concern or believe you have uncovered a vulnerability, we
suggest that you get in touch via the e-mail address
secur...@zookeeper.apache.org. In the message, try to provide a description
of the issue and ideally a way of reproducing it. (...) Please report any
security problems to the project security address before disclosing it
publicly. "

Kind regards,
Mate

On Tue, Jun 16, 2020 at 1:36 PM ashish soni 
wrote:

> Good suggestions Mate. We are in progress to implement both (SSL AND SASL).
> Will try to pan out some destructive cases to test it out :)
>
> On Tue, Jun 16, 2020, 4:07 AM Szalay-Bekő Máté  >
> wrote:
>
> > Also the best is to use QuorumSASL or QuorumSSL to make sure the
> ZooKeeper
> > server-to-server communication is secure and noone who is not trusted can
> > connect and gain access to the quorum.
> >
> > However, if one is using QuorumSASL or QuorumSSL then it is still
> possible
> > that a DOS attack can hit the ZooKeeper port causing problems. But that
> can
> > again be solved by firewalls I think.
> >
> > On Tue, Jun 16, 2020 at 12:49 PM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com> wrote:
> >
> > > > Mate, suppose we do set quorumListenOnAllIPs to true. Will the
> > zookeeper
> > > still connect and form a quorum with only the static or dynamic server
> > > connection strings or it can connect and form a quorum with any IP
> > address
> > > outside the server connection strings as it is allowed to bind with a
> > > 0.0.0.0 interface?
> > >
> > > This is a good question. I think there is a chance that one can
> "intrude"
> > > this way. Although I wouldn't give more tips on the mailing list. :)
> > > The best is to protect the ZooKeeper internal network using firewalls.
> > The
> > > election port and leader port should be reachable only by other
> ZooKeeper
> > > server hosts.
> > >
> > > Regards,
> > > Mate
> > >
> > > On Tue, Jun 16, 2020 at 12:24 PM ashish soni <
> aishwarya.ash...@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Mate, suppose we do set quorumListenOnAllIPs to true. Will the
> zookeeper
> > >> still connect and form a quorum with only the static or dynamic server
> > >> connection strings or it can connect and form a quorum with any IP
> > address
> > >> outside the server connection strings as it is allowed to bind with a
> > >> 0.0.0.0 interface?
> > >>
> > >> Ram, I think you don't need to add this if you have a static IP config
> > or
> > >> using 3.6+. If you feel it is a security issue for the organization,
> try
> > >> ZK
> > >> 3.6.1 without setting that config.
> > >>
> > >> Regards,
> > >> Aishwarya Soni
> > >>
> > >> On Tue, Jun 16, 2020 at 1:03 AM Szalay-Bekő Máté <
> > >> szalay.beko.m...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Ram,
> > >> >
> > >> > > all i want to know is by enabling this property there are no side
> > >> effects
> > >> > or security risks.
> > >> >
> > >> > well, this is something for you (or for your security team) to
> > evaluate.
> > >> > E.g. if your hosts have multiple network interfaces with both
> > "private"
> > >> and
> > >> > "public" networks attached, then I can consider setting
> > >> > quorumListenOnAllIPs=true to be a security risk. Of course you can
> > block
> > >> > the public access with proper firewall rules.
> > >> >
> > >> > But usually ZooKeeper is deployed in some secure / core
> > infrastructure,
> > >> > well protected from DOS / other attacks, in which
> > >> > case quorumListenOnAllIPs=true is not a real security risk.
> > >> >
> > >> > This is something we (the ZooKeeper community) will not be able to
> > >> tell, as
> > >> > this depends on your network topology and your security protocols.
> We
> > >> can
> > >> > only help in explaining what this config is doing.
> > >> >
> > >> > Kind regards,
> > >> > Mate
> > >> >
> > >> > On

Re: Side affects of setting quorumListenOnAllIPs to true

2020-06-16 Thread Szalay-Bekő Máté

Also the best is to use QuorumSASL or QuorumSSL to make sure the ZooKeeper
server-to-server communication is secure and noone who is not trusted can
connect and gain access to the quorum.

However, if one is using QuorumSASL or QuorumSSL then it is still possible
that a DOS attack can hit the ZooKeeper port causing problems. But that can
again be solved by firewalls I think.

On Tue, Jun 16, 2020 at 12:49 PM Szalay-Bekő Máté <
szalay.beko.m...@gmail.com> wrote:

> > Mate, suppose we do set quorumListenOnAllIPs to true. Will the zookeeper
> still connect and form a quorum with only the static or dynamic server
> connection strings or it can connect and form a quorum with any IP address
> outside the server connection strings as it is allowed to bind with a
> 0.0.0.0 interface?
>
> This is a good question. I think there is a chance that one can "intrude"
> this way. Although I wouldn't give more tips on the mailing list. :)
> The best is to protect the ZooKeeper internal network using firewalls. The
> election port and leader port should be reachable only by other ZooKeeper
> server hosts.
>
> Regards,
> Mate
>
> On Tue, Jun 16, 2020 at 12:24 PM ashish soni 
> wrote:
>
>> Hi,
>>
>> Mate, suppose we do set quorumListenOnAllIPs to true. Will the zookeeper
>> still connect and form a quorum with only the static or dynamic server
>> connection strings or it can connect and form a quorum with any IP address
>> outside the server connection strings as it is allowed to bind with a
>> 0.0.0.0 interface?
>>
>> Ram, I think you don't need to add this if you have a static IP config or
>> using 3.6+. If you feel it is a security issue for the organization, try
>> ZK
>> 3.6.1 without setting that config.
>>
>> Regards,
>> Aishwarya Soni
>>
>> On Tue, Jun 16, 2020 at 1:03 AM Szalay-Bekő Máté <
>> szalay.beko.m...@gmail.com>
>> wrote:
>>
>> > Hi Ram,
>> >
>> > > all i want to know is by enabling this property there are no side
>> effects
>> > or security risks.
>> >
>> > well, this is something for you (or for your security team) to evaluate.
>> > E.g. if your hosts have multiple network interfaces with both "private"
>> and
>> > "public" networks attached, then I can consider setting
>> > quorumListenOnAllIPs=true to be a security risk. Of course you can block
>> > the public access with proper firewall rules.
>> >
>> > But usually ZooKeeper is deployed in some secure / core infrastructure,
>> > well protected from DOS / other attacks, in which
>> > case quorumListenOnAllIPs=true is not a real security risk.
>> >
>> > This is something we (the ZooKeeper community) will not be able to
>> tell, as
>> > this depends on your network topology and your security protocols. We
>> can
>> > only help in explaining what this config is doing.
>> >
>> > Kind regards,
>> > Mate
>> >
>> > On Mon, Jun 15, 2020 at 7:12 PM rammohan ganapavarapu <
>> > rammohanga...@gmail.com> wrote:
>> >
>> > > Mate,
>> > >
>> > > Thanks for explaining, all i want to know is by enabling this property
>> > > there are no side effects or security risks.
>> > >
>> > > Ram
>> > >
>> > > On Sun, Jun 14, 2020 at 11:48 PM Szalay-Bekő Máté <
>> > > szalay.beko.m...@gmail.com> wrote:
>> > >
>> > > > Hi Ram,
>> > > >
>> > > > I am not sure I understand your question. The config
>> > quorumListenOnAllIPs
>> > > > is about to specify if the ports ZooKeeper uses for Server-to-server
>> > > > communication should bind on the specified address/IP
>> > > > (quorumListenOnAllIPs=false) or on 0.0.0.0
>> (quorumListenOnAllIPs=true).
>> > > >
>> > > > An example: You configure your server list using either static or
>> > dynamic
>> > > > configuration like:
>> > > > server.1=a.foo.com:2888:3888
>> > > > server.2=b.foo.com:2888:3888
>> > > > ...
>> > > >
>> > > > In this case when server.2 starts, it reads the config then
>> initiates
>> > > > connection (for ZK internal leader election protocol) to server.1 by
>> > > > connecting to a.foo.com:3888 and sending it's own address (
>> > > b.foo.com:3888)
>> > > > enabling server.1 to connect back. However, if server.2 is behind a
>> &

Re: Side affects of setting quorumListenOnAllIPs to true

2020-06-16 Thread Szalay-Bekő Máté

> Mate, suppose we do set quorumListenOnAllIPs to true. Will the zookeeper
still connect and form a quorum with only the static or dynamic server
connection strings or it can connect and form a quorum with any IP address
outside the server connection strings as it is allowed to bind with a
0.0.0.0 interface?

This is a good question. I think there is a chance that one can "intrude"
this way. Although I wouldn't give more tips on the mailing list. :)
The best is to protect the ZooKeeper internal network using firewalls. The
election port and leader port should be reachable only by other ZooKeeper
server hosts.

Regards,
Mate

On Tue, Jun 16, 2020 at 12:24 PM ashish soni 
wrote:

> Hi,
>
> Mate, suppose we do set quorumListenOnAllIPs to true. Will the zookeeper
> still connect and form a quorum with only the static or dynamic server
> connection strings or it can connect and form a quorum with any IP address
> outside the server connection strings as it is allowed to bind with a
> 0.0.0.0 interface?
>
> Ram, I think you don't need to add this if you have a static IP config or
> using 3.6+. If you feel it is a security issue for the organization, try ZK
> 3.6.1 without setting that config.
>
> Regards,
> Aishwarya Soni
>
> On Tue, Jun 16, 2020 at 1:03 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > Hi Ram,
> >
> > > all i want to know is by enabling this property there are no side
> effects
> > or security risks.
> >
> > well, this is something for you (or for your security team) to evaluate.
> > E.g. if your hosts have multiple network interfaces with both "private"
> and
> > "public" networks attached, then I can consider setting
> > quorumListenOnAllIPs=true to be a security risk. Of course you can block
> > the public access with proper firewall rules.
> >
> > But usually ZooKeeper is deployed in some secure / core infrastructure,
> > well protected from DOS / other attacks, in which
> > case quorumListenOnAllIPs=true is not a real security risk.
> >
> > This is something we (the ZooKeeper community) will not be able to tell,
> as
> > this depends on your network topology and your security protocols. We can
> > only help in explaining what this config is doing.
> >
> > Kind regards,
> > Mate
> >
> > On Mon, Jun 15, 2020 at 7:12 PM rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > Mate,
> > >
> > > Thanks for explaining, all i want to know is by enabling this property
> > > there are no side effects or security risks.
> > >
> > > Ram
> > >
> > > On Sun, Jun 14, 2020 at 11:48 PM Szalay-Bekő Máté <
> > > szalay.beko.m...@gmail.com> wrote:
> > >
> > > > Hi Ram,
> > > >
> > > > I am not sure I understand your question. The config
> > quorumListenOnAllIPs
> > > > is about to specify if the ports ZooKeeper uses for Server-to-server
> > > > communication should bind on the specified address/IP
> > > > (quorumListenOnAllIPs=false) or on 0.0.0.0
> (quorumListenOnAllIPs=true).
> > > >
> > > > An example: You configure your server list using either static or
> > dynamic
> > > > configuration like:
> > > > server.1=a.foo.com:2888:3888
> > > > server.2=b.foo.com:2888:3888
> > > > ...
> > > >
> > > > In this case when server.2 starts, it reads the config then initiates
> > > > connection (for ZK internal leader election protocol) to server.1 by
> > > > connecting to a.foo.com:3888 and sending it's own address (
> > > b.foo.com:3888)
> > > > enabling server.1 to connect back. However, if server.2 is behind a
> > > proxy /
> > > > using kubernetes / whatever, then it is possible that you can reach
> > > > server.2 as b.foo.com but the ZK process on server.2 can not
> actually
> > > bind
> > > > on b.foo.com:3888. In this case the easiest solution is to bind on
> > > > 0.0.0.0:3888. However, you can not set 0.0.0.0:3888 in the config
> file
> > > of
> > > > server 2, since in this case server.2 would send 0.0.0.0:3888 in the
> > > > initial message to server.1 and server.1 would try to connect back to
> > > > server.2 using 0.0.0.0:3888 what is a bad idea. So in this case it
> > comes
> > > > handy to set quorumListenOnAllIPs=true which will cause ZooKeeper to
> > bind
> > > > on 0.0.0.0:3888 and still send a 'valid' address in the initial
> > messa

Re: Side affects of setting quorumListenOnAllIPs to true

2020-06-16 Thread Szalay-Bekő Máté

Hi Ram,

> all i want to know is by enabling this property there are no side effects
or security risks.

well, this is something for you (or for your security team) to evaluate.
E.g. if your hosts have multiple network interfaces with both "private" and
"public" networks attached, then I can consider setting
quorumListenOnAllIPs=true to be a security risk. Of course you can block
the public access with proper firewall rules.

But usually ZooKeeper is deployed in some secure / core infrastructure,
well protected from DOS / other attacks, in which
case quorumListenOnAllIPs=true is not a real security risk.

This is something we (the ZooKeeper community) will not be able to tell, as
this depends on your network topology and your security protocols. We can
only help in explaining what this config is doing.

Kind regards,
Mate

On Mon, Jun 15, 2020 at 7:12 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Mate,
>
> Thanks for explaining, all i want to know is by enabling this property
> there are no side effects or security risks.
>
> Ram
>
> On Sun, Jun 14, 2020 at 11:48 PM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
> > Hi Ram,
> >
> > I am not sure I understand your question. The config quorumListenOnAllIPs
> > is about to specify if the ports ZooKeeper uses for Server-to-server
> > communication should bind on the specified address/IP
> > (quorumListenOnAllIPs=false) or on 0.0.0.0 (quorumListenOnAllIPs=true).
> >
> > An example: You configure your server list using either static or dynamic
> > configuration like:
> > server.1=a.foo.com:2888:3888
> > server.2=b.foo.com:2888:3888
> > ...
> >
> > In this case when server.2 starts, it reads the config then initiates
> > connection (for ZK internal leader election protocol) to server.1 by
> > connecting to a.foo.com:3888 and sending it's own address (
> b.foo.com:3888)
> > enabling server.1 to connect back. However, if server.2 is behind a
> proxy /
> > using kubernetes / whatever, then it is possible that you can reach
> > server.2 as b.foo.com but the ZK process on server.2 can not actually
> bind
> > on b.foo.com:3888. In this case the easiest solution is to bind on
> > 0.0.0.0:3888. However, you can not set 0.0.0.0:3888 in the config file
> of
> > server 2, since in this case server.2 would send 0.0.0.0:3888 in the
> > initial message to server.1 and server.1 would try to connect back to
> > server.2 using 0.0.0.0:3888 what is a bad idea. So in this case it comes
> > handy to set quorumListenOnAllIPs=true which will cause ZooKeeper to bind
> > on 0.0.0.0:3888 and still send a 'valid' address in the initial message,
> > an
> > address where other servers can reach it.
> >
> > I hope the explanation made it more (and not less) clear :p
> >
> > Kind regards,
> > Mate
> >
> >
> > On Fri, Jun 12, 2020 at 7:42 PM rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am trying to see what are the pros and cons of setting
> > > quorumListenOnAllIPs to true. Running zookeeper cluster in mtls or
> local
> > > proxy environments is not working by keeping default value (false). So
> > can
> > > someone please explain?
> > >
> > > Any way zookeeper will form quorum with the servers list from the
> > zoo.conf
> > > static file right? so by enabling this property can any server or IP
> out
> > of
> > > the zoo.conf can join the quorum?
> > >
> > > Ram
> > >
> >
>

Re: Side affects of setting quorumListenOnAllIPs to true

2020-06-15 Thread Szalay-Bekő Máté

Hi Ram,

I am not sure I understand your question. The config quorumListenOnAllIPs
is about to specify if the ports ZooKeeper uses for Server-to-server
communication should bind on the specified address/IP
(quorumListenOnAllIPs=false) or on 0.0.0.0 (quorumListenOnAllIPs=true).

An example: You configure your server list using either static or dynamic
configuration like:
server.1=a.foo.com:2888:3888
server.2=b.foo.com:2888:3888
...

In this case when server.2 starts, it reads the config then initiates
connection (for ZK internal leader election protocol) to server.1 by
connecting to a.foo.com:3888 and sending it's own address (b.foo.com:3888)
enabling server.1 to connect back. However, if server.2 is behind a proxy /
using kubernetes / whatever, then it is possible that you can reach
server.2 as b.foo.com but the ZK process on server.2 can not actually bind
on b.foo.com:3888. In this case the easiest solution is to bind on
0.0.0.0:3888. However, you can not set 0.0.0.0:3888 in the config file of
server 2, since in this case server.2 would send 0.0.0.0:3888 in the
initial message to server.1 and server.1 would try to connect back to
server.2 using 0.0.0.0:3888 what is a bad idea. So in this case it comes
handy to set quorumListenOnAllIPs=true which will cause ZooKeeper to bind
on 0.0.0.0:3888 and still send a 'valid' address in the initial message, an
address where other servers can reach it.

I hope the explanation made it more (and not less) clear :p

Kind regards,
Mate

On Fri, Jun 12, 2020 at 7:42 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> I am trying to see what are the pros and cons of setting
> quorumListenOnAllIPs to true. Running zookeeper cluster in mtls or local
> proxy environments is not working by keeping default value (false). So can
> someone please explain?
>
> Any way zookeeper will form quorum with the servers list from the zoo.conf
> static file right? so by enabling this property can any server or IP out of
> the zoo.conf can join the quorum?
>
> Ram
>

Re: Zookeeper client fails during SASL authentication

2020-06-11 Thread Szalay-Bekő Máté

Hello Aparajita,

After a quick glance on your configs and logs, I haven't found any problem
with your zookeeper configs. I am not sure if you know this page, using
these steps worked for me to setup a kerberized zookeeper:
https://github.com/ekoontz/zookeeper/wiki
I guess you are also familiar with our wiki:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication

Based on your logs the problem is here:
 2020-06-10 17:09:27,007 - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@969] - Client failed to SASL
> authenticate: javax.security.sasl.SaslException: GSS initiate failed
> [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism
> level: Invalid argument (400) - Cannot find key of appropriate type to
> decrypt AP REP - AES256 CTS mode with HMAC SHA1-96)]
>

This is a kerberos / jaas related issue, I don't think it is zookeeper
related. a few thing you might wish to check:
- make sure you have "Java Cryptography Extension (JCE) Unlimited Strength
Jurisdiction Policy Files" installed (I think you need them for AES256?)
and your java security configs are OK
- run "klist -e -k  /etc/krb5.keytab" to see if what encryptions you have
in the keytabs
- check if you have full export support in JCE by "java KeyLengthDetector"
- Maybe you can try with different encryption types in kerberos configs /
during keytab generation.
- trying to use a different java version (latest JDK patches have some
known kerberos backward-incompatibilities)

Unfortunately I am not a kerberos expert, so I don't know much about these
issues, I just used google to find some hints :)
Maybe someone else in the community with deeper kerberos knowledge can help
you more.

Kind regards,
Mate

On Thu, Jun 11, 2020 at 9:47 AM Aparajita Singh 
wrote:

> gentle reminder
> (unquoting the previous email)
>
> --
>
> Hi,
>
> I am trying to migrate an unauthenticated zookeeper cluster to a kerberos
> authenticated one. For the time being SSL is disabled. I have configured
> the server and client as described below but when SASL is enabled I am
> unable to retreive data using zookeeper shell client from the zookeeper
> server. Could I get some help in understanding why this is failing?
>
>
> *server.log snippet*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *2020-06-10 17:09:01,263 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ] - Accepted socket
> connection from /127.0.0.1:44994 2020-06-10
> 17:09:01,264 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@827
> ] - Processing mntr command
> from /127.0.0.1:44994 2020-06-10 17:09:01,265 -
> INFO  [Thread-5:NIOServerCnxn@1007] - Closed socket connection for client
> /127.0.0.1:44994  (no session established for
> client)2020-06-10 17:09:26,647 - INFO  [main:Environment@100] - Client
> environment:zookeeper.version=3.4.6-169--1, built on 02/10/2016 05:49
> GMT2020-06-10 17:09:26,649 - INFO  [main:Environment@100] - Client
> environment:host.name =stage-kdc-zk-ivy2020-06-10
> 17:09:26,649 - INFO  [main:Environment@100] - Client
> environment:java.version=1.8.0_1722020-06-10 17:09:26,651 - INFO
>  [main:Environment@100] - Client environment:java.vendor=Oracle
> Corporation2020-06-10 17:09:26,651 - INFO  [main:Environment@100] - Client
> environment:java.home=/usr/lib/jvm/oracle-java8-jdk-amd64/jre2020-06-10
> 17:09:26,651 - INFO  [main:Environment@100] - Client
>
> environment:java.class.path=/usr/hdp/2.4.0.0-169/zookeeper/bin/../build/classes:/usr/hdp/2.4.0.0-169/zookeeper/bin/../build/lib/*.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/xercesMinimal-1.9.6.2.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-provider-api-2.4.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-http-shared4-2.4.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-http-shared-1.0-beta-6.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-http-2.4.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/wagon-file-1.0-beta-6.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/plexus-utils-3.0.8.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/plexus-interpolation-1.11.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/hdp/2.4.0.0-169/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/hdp/
> 2.4.0.
>

Re: How to use -DTHREADED compile option while compiling

2020-06-10 Thread Szalay-Bekő Máté

Hi Pankaj,

Sorry, I am not very experienced in C++... maybe this is why I am still not
sure what is your problem.

Is this a compile time error or a link error? Are you trying to develop
your own new C / C++ application using ZooKeeper client? Or you want to
build some existing application that is using ZooKeeper, which was built
successfully before but doesn't work anymore with newer versions of
ZooKeeper?

If you need API functions defined in the zookeeper.h that are guarded
by THREADED or HAVE_OPENSSL_H, then why not you simply define these before
importing zookeeper.h ?

Kind regards,
Mate


On Tue, Jun 9, 2020 at 10:58 AM Pankaj Kumar 
wrote:

> Hi Szalay,
>
> I have tried the same as you mentioned, linked code against zookeeper_mt
>  library but still I am not able to call the API’s present under *#ifdef
> THREADED. *When I comment this *ifdef *then I am able to call the API’s.
> Getting the same error as mentioned below.
>
> Also I have to use ssl support which is defined like this in zookeeper.h
>
>
>
> #ifdef HAVE_OPENSSL_H
>
> ZOOAPI zhandle_t *zookeeper_init_ssl(const char *host, const char *cert,
> watcher_fn fn,
>
> int recv_timeout, const clientid_t *clientid, void *context, int flags);
>
> #endif
>
>
>
> While building library with maven, it was mentioned that
> -Dc-client-openssl has default value *yes. *But still I am not able to
> call zookeeper_init_ssl API.
>
> Looks like the API’s defined under *ifdef* I am not able to call that.
>
> I am attaching zookeeper.spec which I use to make libzookeeper and
> libzookeeper-devel libraries.
>
> Please help me in defining those ifdef THREADED and HAVE_OPENSSL_H
> variables.
>
>
>
> Thanks.
>
> Pankaj
>
>
>
>
>
> >Hello Pankaj,
>
>
>
> >ZooKeeper C client provides two different APIs, a sync (multithreaded) and
>
> >an async (single threaded).
>
> >If you want to use the sync API (multi threaded) then make sure to link
>
> >your application code against the zookeeper_mt library.
>
>
>
> >When compiling the ZooKeeper C client code, you can set if you want to also
>
> >build the zookeeper_mt library. Depending on your preferred build tool:
>
> >- cmake: use the -DTHREADED cmake option to enable sync API build
>
> >- make: use the "./configure --without-syncapi" to disable the sync API
>
> >build
>
>
>
> >normally I just use "mvn clean install -DskipTests -Pfull-build" command to
>
> >build both the java and C code, which will build both the sync and async
>
> >zookeeper libraries.
>
>
>
> >This readme file should help, although I am not 100% sure it is totally
>
> >up-to-date :)
>
> >https://github.com/apache/zookeeper/tree/master/zookeeper-client/zookeeper-client-c
>
>
>
> >Kind regards,
>
> >Mate
>
>
>
> *From: *Pankaj Kumar 
> *Date: *Friday, 5 June 2020 at 5:19 PM
> *To: *"user@zookeeper.apache.org" 
> *Subject: *How to use -DTHREADED compile option while compiling
>
>
>
> Hi,
>
> I was making libzookeeper and libzookeeper-devel for latest zookeeper
> release 3.6.1.
>
> In our software we are making some api calls, however some api calls are
> giving error:-
>
>
>
> *error: *‘*zoo_create*’ was not declared in this scope
>
>  path_buffer_len);
>
> *^*
>
> *error: *‘*zoo_delete*’ was not declared in this scope
>
>  return zoo_delete(zh, path, version);
>
>
>
> *error: *‘*zoo_get*’ was not declared in this scope
>
>  return zoo_get(zh, path, watch, buffer, buffer_len, stat);
>
>  *^*
>
> *error: *‘*zoo_exists*’ was not declared in this scope
>
>
>
>
>
> Then after looking into latest library code and found that these calls are
> defined under *#define **THREADED*
>
> And for this thing to work I have to compile zookeeper C client with -*D*
> *THREADED* option.
>
>
>
> What I want to ask that How can I enable this compile -*D**THREADED*
> option. I have tried this with “make” command and “./configure” command,
> but couldn’t proceed further.
>
>
>
> Can someone please help me on how to use *D**THREADED* option?
>
>
>
> Thanks,
>
> Pankaj
>
> Juniper Business Use Only
>

Re: How to use -DTHREADED compile option while compiling

2020-06-08 Thread Szalay-Bekő Máté

Hello Pankaj,

ZooKeeper C client provides two different APIs, a sync (multithreaded) and
an async (single threaded).
If you want to use the sync API (multi threaded) then make sure to link
your application code against the zookeeper_mt library.

When compiling the ZooKeeper C client code, you can set if you want to also
build the zookeeper_mt library. Depending on your preferred build tool:
- cmake: use the -DTHREADED cmake option to enable sync API build
- make: use the "./configure --without-syncapi" to disable the sync API
build

normally I just use "mvn clean install -DskipTests -Pfull-build" command to
build both the java and C code, which will build both the sync and async
zookeeper libraries.

This readme file should help, although I am not 100% sure it is totally
up-to-date :)
https://github.com/apache/zookeeper/tree/master/zookeeper-client/zookeeper-client-c

Kind regards,
Mate


On Fri, Jun 5, 2020 at 6:04 PM Pankaj Kumar 
wrote:

> Hi,
> I was making libzookeeper and libzookeeper-devel for latest zookeeper
> release 3.6.1.
> In our software we are making some api calls, however some api calls are
> giving error:-
>
>
> error: ‘zoo_create’ was not declared in this scope
>
>  path_buffer_len);
>
> ^
>
> error: ‘zoo_delete’ was not declared in this scope
>
>  return zoo_delete(zh, path, version);
>
>
>
> error: ‘zoo_get’ was not declared in this scope
>
>  return zoo_get(zh, path, watch, buffer, buffer_len, stat);
>
>  ^
>
> error: ‘zoo_exists’ was not declared in this scope
>
>
>
>
>
> Then after looking into latest library code and found that these calls are
> defined under #define THREADED
> And for this thing to work I have to compile zookeeper C client with
> -DTHREADED option.
>
> What I want to ask that How can I enable this compile -DTHREADED option. I
> have tried this with “make” command and “./configure” command, but couldn’t
> proceed further.
>
> Can someone please help me on how to use DTHREADED option?
>
> Thanks,
> Pankaj
>
>
> Juniper Business Use Only
>

Re: [ANNOUNCE] Apache Curator 5.0.0 released

2020-05-29 Thread Szalay-Bekő Máté

congratulations for the Curator community, seems to be a nice release! :)

On Fri, May 29, 2020 at 1:48 AM Cameron McKenzie 
wrote:

> Hello,
>
> The Apache Curator team is pleased to announce the  release of version
> 5.0.0. Apache  Curator is a Java/JVM client library for Apache
> ZooKeeper[1], a distributed coordination service. Apache Curator includes a
> high-level API framework and utilities to make using Apache ZooKeeper much
> easier and more reliable. It also includes recipes for common use cases and
>  extensions such as service discovery and a Java 8 asynchronous DSL. For
> more details, please visit the project website: http://curator.apache.org/
>
> The download page for Apache Curator is here:
> https://cwiki.apache.org/confluence/display/CURATOR/Releases
>
> The binary artifacts for Curator are available from Maven Central and its
> mirrors.
>
> For general information on Apache Curator, please visit the project
> website:
> http://curator.apache.org
>
> Release Notes - Apache Curator - Version 5.0.0
>
> ** Bug
> * [CURATOR-440] - curator-framework is unable to load in OSGi
> * [CURATOR-464] - Unable to instantiate client in OSGi
> * [CURATOR-525] - There is a race condition in Curator which might lead
> to fake SUSPENDED event and ruin CuratorFrameworkImpl inner state
> * [CURATOR-559] - Inconsistent ZK timeouts
>
> ** New Feature
> * [CURATOR-544] - Implement SessionFailedRetryPolicy
>
> ** Improvement
> * [CURATOR-549] - ZooKeeper 3.6 will add support for Persistent
> Recursive Watchers - Add Curator support
> * [CURATOR-558] - ZooKeeper 3.6.0 has many API changes - bring Curator
> up to date
> * [CURATOR-562] - Remove ConnectionHandlingPolicy
> * [CURATOR-564] - Changes to retry failed TestingServer starts should
> be applied to TestingCluster
> * [CURATOR-568] - New option allowing CuratorFramework skip ZK ensemble
> tracking
>
> Regards,
>
> The Curator Team
>
> [1] Apache ZooKeeper https://zookeeper.apache.org/
>

Re: Need Help with Maven Build

2020-05-20 Thread Szalay-Bekő Máté

I saw this problem a few times (usually when I build from IntelliJ console,
after I changed to a different git branch).

My solution is usually:
git clean -xdf
git reset --hard
mvn clean

after these steps usually "mvn clean install -DskipTests" works just fine

On Wed, May 20, 2020 at 1:27 PM Jun Wang  wrote:

> I am using latest maven, build failed with zookeeper source code checked
> out from github. but build is fine with downloaded source code
> apache-zookeeper-3.6.1.tar.gz
>
> $ mvn --version
> Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
> Maven home: /home/jun/programs/apache-maven-3.6.3
> Java version: 1.8.0_251, vendor: Oracle Corporation, runtime:
> /home/jun/programs/jdk1.8.0_251/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "4.10.0-38-generic", arch: "amd64", family:
> "unix"
>
>
> 
> From: Michael Han 
> Sent: Wednesday, May 20, 2020 1:08 AM
> To: user 
> Cc: d...@zookeeper.apache.org 
> Subject: Re: Need Help with Maven Build
>
> hi jun - which maven version you are using?
>
> If it's 3.5.x, try upgrade to 3.6.x. I had the exact same issue a while
> back and upgrade maven fixed this, so I didn't bother to debug. That said,
> it's interesting to understand why we failed under specific version of
> maven / env, so cc dev list where we have a few maven experts who might be
> able to help debug.
>
>
> On Tue, May 19, 2020 at 8:34 AM Jun Wang  wrote:
>
> > Hi
> >
> > I got following build error with latest code from github.  But build is
> > fine with downloaded source code.   Any suggestion is appreciated.
> >
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile
> > (default-compile) on project zookeeper: Fatal error compiling:
> > java.lang.NullPointerException -> [Help 1]
> >
> >
> >
> https://gist.githubusercontent.com/wj1918/b1bcea0473b9ff2096ffa22e3c387e8f/raw/8c2ccfb7919470e0e874abdec5633976720e3dca/zookeeper.build.error.txt
> >
> > Thanks
> > Jun
>

Re: ZK not starting during upgrade to use 3.6.1 with SSL communication

2020-05-18 Thread Szalay-Bekő Máté

I am not exactly sure where we are now...

Did you managed to setup what you wanted to?
Am I right that you need Quorum SSL and client SSL, while you want to
disable the unsecure Client connection?


I think this case the following config should work, using ZooKeeper 3.6.1:

 ---  zoo.cfg starts here -
# generic
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=10
syncLimit=5
maxClientCnxns=0
leaderServes=yes
autopurge.snapRetainCount=10
autopurge.purgeInterval=24
standaloneEnabled=false
admin.enableServer=false
reconfigEnabled=true
audit.enable=true
quorumListenOnAllIPs=true
4lw.commands.whitelist=*
dynamicConfigFile=/conf/zoo.cfg.dynamic

# only after upgrade, until you have at least one snapshot on each ZK server
snapshot.trust.empty=true

# quorum SSL
sslQuorum=true
serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
ssl.quorum.keyStore.location=
ssl.quorum.keyStore.password=
ssl.quorum.trustStore.location=
ssl.quorum.trustStore.password=

# client SSL
secureClientPort=2181
clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
ssl.keyStore.location=
ssl.keyStore.password=
ssl.trustStore.location=
ssl.trustStore.password=
 ---  end of zoo.cfg -

and:

 ---  zoo.cfg.dynamic starts here -
server.1=zoo1:2888:3888:participant
server.2=zoo2:2888:3888:participant
server.3=zoo3:2888:3888:participant
 ---  end of zoo.cfg.dynamic -


If the above config doesn't work and the cluster can not come up, then I
would assume the problem is related to your keystore / truststore files. Or
something else. Maybe debug logs would help to figure out what is the
problem.

1) please try again with some extra debug logging by setting the following
environment variables before starting zkServer.sh:
export ZOO_LOG4J_PROP="DEBUG,CONSOLE,ROLLINGFILE"
export SERVER_JVMFLAGS="$SERVER_JVMFLAGS -Dzookeeper.log.threshold=DEBUG
-Dzookeeper.console.threshold=DEBUG"
(optionally if you want to specify where ZooKeeper should print its
logs: export ZOO_LOG_DIR="/var/logs/zookeeper" )

 2) please create a Jira ticket (
https://issues.apache.org/jira/projects/ZOOKEEPER/) where you attach:
- zoo.cfg
- zoo.cfg.dynamic
- the debug logs for all your ZooKeeper servers

3) please ping me (@symat) on the ticket and I will check the debug logs.

Kind regards,
Mate

On Fri, May 15, 2020 at 2:44 AM blb.dev  wrote:

> Ashish, thank you for detailing why you chose that parameter! You're right
> we
> wouldn't need that in our config.
>
> Anyone else have any ideas why my zookeeper quorum is not starting up with
> this configuration? I am unfortunately still blocked as it will not start
> up.
>
> I need to configure encrypted quorum and client communication (and also
> accept non ssl client communications while clients update) - guidance on
> how
> to change my config params to help with the startup?
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: Error in zookeeper 3.6.1 logs with Apache Storm 1.22: Unable to read additional data from client, it probably closed the socket

2020-05-15 Thread Szalay-Bekő Máté

In theory, the 3.4 client can connect / work with the 3.6 server without
any problem. I am not sure how Curator complicates this issue, but based on
the logs I don't see any ZooKeeper issue just yet.

The logs you pasted here shows that the Storm/Curator node closed the
ZooKeeper client session, maybe due to some timeout. The ZooKeeper Client
will terminate the session if the 2/3 time of SessionTimeout elapsed, then
it tries to reconnect (I guess to an other ZooKeeper node). The
"Revalidating client" appears because of the reconnection. Maybe the
ZooKeeper server is overloaded / slow to respond? Or maybe the negotiated
session timeout is too low? Can you see the session timeout for these
sessions (e.g. for 0x200866346830367) in the Storm logs? Or can you see
anything else suspicious in the ZooKeeper client logs (aka. Storm/Curator
logs) around this time?

On Fri, May 15, 2020 at 1:22 AM ashish soni 
wrote:

> An old issue for reference:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1582
>
> On Thu, May 14, 2020 at 4:20 PM ashish soni 
> wrote:
>
> > Hi,
> >
> > I am running a containerized zookeeper 3.6.1 version and it is getting
> > filled with below WARN logs,
> >
> > [myid:2] - WARN [NIOWorkerThread-2:NIOServerCnxn@364] - Unexpected
> > exception EndOfStreamException: Unable to read additional data from
> client,
> > it probably closed the socket: address = /10.162.53.62:48988, session =
> > 0x200866346830367 at
> >
> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
> > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
> > at
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
> > at
> >
> org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > [myid:2] - INFO [NIOWorkerThread-1:Learner@137] - Revalidating client:
> > 0x3008663464f03ac [myid:2] - INFO [NIOWorkerThread-1:Learner@137] -
> > Revalidating client: 0x100866346320382 INFO
> [NIOWorkerThread-1:Learner@137]
> > - Revalidating client: 0x1008663463202d9 INFO
> [NIOWorkerThread-1:Learner@137]
> > - Revalidating client: 0x1008663463203b2 INFO
> [NIOWorkerThread-1:Learner@137]
> > - Revalidating client: 0x2008663468302d5 INFO
> [NIOWorkerThread-1:Learner@137]
> > - Revalidating client: 0x4008663438f03ca INFO
> [NIOWorkerThread-1:Learner@137]
> > - Revalidating client: 0x1008663463203bf and so on
> >
> > I have multiple instances of Apache Storm 1.22 running as containerized
> > service which is running on 10.162.53.62 (IP address in the above WARN
> > logs). The storm is using a 3.4.14 zookeeper version and the curator is
> at
> > 4.0.1 (which is compatible with 3.5+ versions). Here is the pool file
> from
> > the storm 1.1.x branch
> > https://github.com/apache/storm/blob/1.1.x-branch/pom.xml for reference.
> > Even the latest storm is using the same configs as far as the zookeeper
> is
> > concerned.
> >
> > So, is something wrong with the compatibility of the storm's zookeeper
> > version and mine? Or is anything missing from my zookeeper configs point
> of
> > view? Any details on what might be happening would really be helpful.
> >
> > Here is my zoo.cfg,
> >
> > tickTime=2000
> > # The number of ticks that the initial
> > # synchronization phase can take
> > initLimit=20
> > # The number of ticks that can pass between
> > # sending a request and getting an acknowledgement
> > syncLimit=15
> > # HA (multi-node) setup
> > standaloneEnabled=false
> > dynamicConfigFile=/opt/zookeeper/conf/zoo.cfg.dynamic
> >
> > # location of in-memory database snapshots
> > dataDir=/data
> > # location of the transaction-log
> > dataLogDir=/data/xnlogs
> >
> > # enable regular purging of old data
> > autopurge.purgeInterval=24
> > autopurge.snapRetainCount=5
> >
> >
>

[ANNOUNCE] Apache ZooKeeper 3.5.8

2020-05-12 Thread Szalay-Bekő Máté

The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.5.8

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.5.8 Release Notes are at:
https://zookeeper.apache.org/doc/r3.5.8/releasenotes.html


We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team

Re: ZooKeeper dynamic reconfig issue when Quorum authn/authz is enabled

2020-05-11 Thread Szalay-Bekő Máté

Hi Rakiran,

FYI: we are setting kerberos.removeHostFromPrincipal=true
and kerberos.removeRealmFromPrincipal=true in our configs in production.
Although I am not sure if they are also affecting quorum SASL too and not
only client SASL.
But also, we don't use dynamic reconfig in production yet.

But I agree with Enrico, this smells like a bug. If the principals with the
new hosts are properly configured in Kerberos, then the
Quoum Authentication should work I think.

Kind regards,
Mate

On Sat, May 9, 2020 at 7:24 AM rajsura  wrote:

> Hi Enrico,
>
> Thanks again for your reply.
>
> Yes, I have this problem in both production and test environments.
>
> For now, after reconfig, we are rolling restart the members. It would be
> great if you can loop in some users of reconfig and quorum authn/authz.
>
> Regards,
> Rajkiran
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: Questions about network segmentation problems

2020-04-28 Thread Szalay-Bekő Máté

also, if you want, you can enable read-only mode in the ZooKeeper
server.  Read-only mode allows client sessions to connect to the server
even when the server might be partitioned from the quorum. In this mode the
clients can still read values from the ZK service, but will be unable to
write values and see changes from other clients. Also beside enabling this
on server-side, you have to allow this on the client side as well when you
initiate the connection. By default the Read-only mode is disabled and
ZooKeeper behaves as Chris described.

Kind regards,
Mate

On Tue, Apr 28, 2020 at 6:27 PM Chris T.  wrote:

> 1: It will close the client port and will remain unavailable for clients
> until it can form or join a quorum (majority).
> 2: No, see above.
> 3+4: They will keep trying to connect to the Zookeeper servers in the
> connection string until they find one that works. The exact messages you
> get depend on the client application or framework you are using. For
> example Apache Curator framework or the internal client implementations of
> SOLR or KAFKA all have different behaviour and messages messages. Something
> like Connection State Lost, Client Connection timed out, Attempting
> reconnect etc...
>
> Regards,
>
> Chris
>
>
> On Tue, Apr 28, 2020 at 5:47 PM Vincent Ngan 
> wrote:
>
> > Hi,
> >
> > I would like to know what will happen to ZooKeeper servers and the
> clients
> > connected to them when a network segmentation occurs.
> >
> > Supposing a network segmentation happens. One of the ZK servers
> > looses contact with all the other ZK servers. This ZK server is still but
> > it should know that it is not among the majority of a quorum. Then,
> >
> >1. What will happen to this isolated ZK server?
> >2. Will it still function and serve client requests?
> >3. If there are clients also located in the same isolated segment and
> >are currently connected to this ZK server, what will happen to these
> >clients?
> >4. What errors code and messages will these clients detect?
> >
> > Best regards,
> > VN
> >
>

Re: ZK connection broken to one server shutting down entire quorum

2020-04-22 Thread Szalay-Bekő Máté

in ZOOKEEPER-3769 and ZOOKEEPER-3016 the following type of log lines were
showing the problem:

03/24/20 11:16:16,297 [WorkerReceiver[myid=1]] ERROR
[org.apache.zookeeper.server.NIOServerCnxnFactory]
(NIOServerCnxnFactory.java:92) - Thread
Thread[WorkerReceiver[myid=1],5,main] died

But I am not sure about the logs on 3.4.14. Also there are multiple threads
that can die here, not only the WorkerReceiver. But the death of the thread
will always be logged by the NIOServerCnxnFactory. Maybe try to grep on
"NIOServerCnxnFactory" and on "died". If you find anything, then look for
errors / exceptions around this log line to see what happened.

On Wed, Apr 22, 2020 at 6:01 PM blb.dev  wrote:

> Hi, thank you for the response!
>
> When you say maybe "some internal listener thread in the leader (zoo3)
> died"
> is there a particular string I could search in the logs to look for that?
>
> We plan on upgrading - waiting for 3.6.1 as we've had some issues moving to
> 3.6.0.
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: ZK connection broken to one server shutting down entire quorum

2020-04-22 Thread Szalay-Bekő Máté

Hello!

as far as I can tell, the provided logs are not enough to determine the
exact root cause of the problem. Maybe someone else will have a better
idea, but my best guess would be that some internal listener thread in the
leader (zoo3) died before, so it was not able to parse the leader election
messages from zoo1 and/or zoo2. When you restarted the leader, then the
listener threads re-initialized, so everything went back to normal.

There were a couple of issues like this reported already:
- https://issues.apache.org/jira/browse/ZOOKEEPER-2938
- https://issues.apache.org/jira/browse/ZOOKEEPER-2186
- https://issues.apache.org/jira/browse/ZOOKEEPER-3016
...
ZooKeeper 3.4.14 should already contain the fix for these above. However,
we just recently fixed a similar issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-3769
this fix will be part of 3.6.1 and 3.5.8.

Of course, it is possible that you were hitting an independent / unknown
issue... We would need all the logs to verify that (the logs from each
ZooKeeper servers since their last restart before starting the rolling
upgrade).

Anyway, I strongly suggest to upgrade your ZooKeeper cluster, as the 3.4
will be EOL soon, see the announcement:
https://mail-archives.apache.org/mod_mbox/zookeeper-user/202004.mbox/browser

Kind regards,
Mate

On Wed, Apr 22, 2020 at 2:14 AM blb.dev  wrote:

> Hi team,
>
> During a recent patching for our ZK quorum, we experienced an unrecoverable
> outage. We have performed patches like this many times previously and is
> working fine in other environments of ours. The goal was to shut down each
> server one by one and provide patch updates then restart. However, this
> time, when zoo1 (follower) was shut down, the leader (zoo3) shutdown
> connection with remaining follower (zoo2) as well.
>
> What would cause the entire quorum to shutdown and not recover due to
> stopping only zoo1?
>
> Running zookeeper 3.4.14 in docker containers.
> zk_version  3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on
> 03/06/2019 16:18 GMT
>
>
> *Config for master nodes:*
> tickTime=2000
> maxClientCnxns=0
> dataDir=/data
> dataLogDir=/datalog
> clientPort=2181
> secureClientPort=2281
> initLimit=10
> syncLimit=5
> autopurge.snapRetainCount=10
> autopurge.purgeInterval=24
>
> server.1=zoo1:2888:3888
> server.2=zoo2:2888:3888
> server.3=zoo3:2888:3888
> server.4=zoo4:2888:3888:observer
> server.5=zoo5:2888:3888:observer
> server.6=zoo6:2888:3888:observer
>
> *Config for observer nodes:*
> tickTime=2000
> maxClientCnxns=0
> dataDir=/data
> dataLogDir=/datalog
> clientPort=2181
> secureClientPort=2281
> initLimit=10
> syncLimit=5
> autopurge.snapRetainCount=10
> autopurge.purgeInterval=24
> peerType=observer
> server.1=zoo1:2888:3888
> server.2=zoo2:2888:3888
> server.3=zoo3:2888:3888
> server.4=zoo4:2888:3888:observer
> server.5=zoo5:2888:3888:observer
> server.6=zoo6:2888:3888:observer
>
> zoo1-zoo6 are the FQDNs of each server.
>
> Shutdown of zoo1 and quorum outage at 05:02 UTC
>
>
> *Logs on zoo3 (leader):*
> 2020-04-21 05:02:00,120 [myid:3] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1010)
> 2020-04-21 05:02:00,096 [myid:3] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1025] - Connection broken for id
> 1, my id = 3, error =
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1010)
> 2020-04-21 05:02:00,120 [myid:3] - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
> 2020-04-21 05:02:00,143 [myid:3] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@941] - Interrupted while waiting
> for message on queue
> java.lang.InterruptedException
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
> at
> java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
>
> 2020-04-21 05:02:00,143 [myid:3] - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@951] - Send worker leaving
> thread
> java.lang.InterruptedException
> at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
> at
>
>

Re: upgrade from 3.4.5 to 3.5.6

2020-04-17 Thread Szalay-Bekő Máté

> com.sun.tools.doclint.DocLint$DeclScanner.visitClass(DocLint.java:359)
> at
> com.sun.tools.doclint.DocLint$DeclScanner.visitClass(DocLint.java:346)
> at
> com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:720)
> at
> com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:68)
> at
> com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:81)
> at
> com.sun.source.util.TreeScanner.visitNewClass(TreeScanner.java:280)
> at
> com.sun.tools.javac.tree.JCTree$JCNewClass.accept(JCTree.java:1532)
> at
> com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:68)
> at
> com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:81)
> at
> com.sun.source.util.TreeScanner.visitVariable(TreeScanner.java:153)
> at
> com.sun.tools.doclint.DocLint$DeclScanner.visitVariable(DocLint.java:373)
> at
> com.sun.tools.doclint.DocLint$DeclScanner.visitVariable(DocLint.java:346)
> at
> com.sun.tools.javac.tree.JCTree$JCVariableDecl.accept(JCTree.java:864)
> at
> com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:68)
> at
> com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:81)
> at com.sun.source.util.TreeScanner.scan(TreeScanner.java:91)
> at
> com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:99)
> at com.sun.source.util.TreeScanner.visitClass(TreeScanner.java:133)
> at
> com.sun.tools.doclint.DocLint$DeclScanner.visitClass(DocLint.java:360)
> at
> com.sun.tools.doclint.DocLint$DeclScanner.visitClass(DocLint.java:346)
> at
> com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:720)
> at
> com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:68)
> at com.sun.source.util.TreeScanner.scan(TreeScanner.java:91)
> at
> com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:99)
> at
> com.sun.source.util.TreeScanner.visitCompilationUnit(TreeScanner.java:120)
> at
>
> com.sun.tools.doclint.DocLint$DeclScanner.visitCompilationUnit(DocLint.java:354)
> at
>
> com.sun.tools.doclint.DocLint$DeclScanner.visitCompilationUnit(DocLint.java:346)
> at
> com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:550)
> at
> com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:68)
> at com.sun.tools.doclint.DocLint$3.started(DocLint.java:296)
> at
>
> com.sun.tools.javac.api.ClientCodeWrapper$WrappedTaskListener.started(ClientCodeWrapper.java:668)
> at
>
> com.sun.tools.javac.api.MultiTaskListener.started(MultiTaskListener.java:103)
> at
> com.sun.tools.javac.main.JavaCompiler.attribute(JavaCompiler.java:1240)
> at
> com.sun.tools.javac.main.JavaCompiler.compile2(JavaCompiler.java:901)
> at
> com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:860)
> at com.sun.tools.javac.main.Main.compile(Main.java:523)
> ... 27 more
> [ERROR]
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>
> Can someone help me on this issue?
>
> Thanks,
> -
> Kuldeep Singh Budania
>
>
>
> On Sat, Apr 4, 2020 at 5:57 PM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > these exceptions can mean many things... I think this can be even normal
> > duding rolling restart (as some connections get broken in this case)
> >
> > However, I saw cases already when exceptions like these killed receiver
> or
> > sender threads in QuorumCnxManager / Leader Election in such a way that
> > they were not able to recover, so the node was unable to connect to any
> > quorum until restart. I remember seeing this in 3.4 too.
> >
> > Do you see these exceptions in the second server (the one which you just
> > upgraded in step 3)?
> > Is this issue reproducible?
> >
> > What is the tickTime and initLimit you use? Maybe the server just require
> > more time to sync?
> >
> > I would need more logs to really see what happened. Can you create a Jira
> > issue and upload the logs and also the ZooKeeper configs? I am happy to
> > take a closer look.
> > (if you need to re-run the test to collect the logs, then enabling DEBUG
> > logs would be great. The INFO level

Re: upgrade from 3.4.5 to 3.5.6

2020-04-04 Thread Szalay-Bekő Máté

these exceptions can mean many things... I think this can be even normal
duding rolling restart (as some connections get broken in this case)

However, I saw cases already when exceptions like these killed receiver or
sender threads in QuorumCnxManager / Leader Election in such a way that
they were not able to recover, so the node was unable to connect to any
quorum until restart. I remember seeing this in 3.4 too.

Do you see these exceptions in the second server (the one which you just
upgraded in step 3)?
Is this issue reproducible?

What is the tickTime and initLimit you use? Maybe the server just require
more time to sync?

I would need more logs to really see what happened. Can you create a Jira
issue and upload the logs and also the ZooKeeper configs? I am happy to
take a closer look.
(if you need to re-run the test to collect the logs, then enabling DEBUG
logs would be great. The INFO level logs are usually enough for these
problems, but one can never know...)

Kind regards,
Mate


On Fri, Apr 3, 2020 at 10:05 AM kuldeep singh 
wrote:

> Hi Team,
>
> I have done some POC on rolling upgrade and found below result.
>
>
>1. On 1st node upgrade zookeeper . Traffic was running fine because 2
>nodes are already on old zookeeper.
>2. On 1st node upgrade our application and didn’t find any issue
>3. On 2nd node upgrade zookeeper but got below error and zookeeper is
>not taking any requests
>4.
>
> java.io.EOFException
>
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
> at
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
>
> 2020-03-30 14:19:55,587 - WARN
> [RecvWorker:1:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
>
> 2020-03-30 14:19:55,588 - ERROR [LearnerHandler-/192.168.44.73:33754
> :LearnerHandler@562] - Unexpected exception causing shutdown while sock
> still open
>
> java.io.EOFException
>
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
>
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>
> at
>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>
> at
>
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:476)
>
> 2020-03-30 14:19:55,588 - WARN
> [SendWorker:1:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
> for message on queue
>
> Please let me know is this the known issue or this is different issue which
> is mention in Apache zookeeper documentation when upgrading from 3.4.5 to
> 3.5.6
>
> Thanks,
> -
> Kuldeep Singh Budania
> Software Architect
>
>
>
> On Sun, Mar 29, 2020 at 9:06 AM Alexander Shraer 
> wrote:
>
> > +1 to what Mate said (I wrote the quoted instructions).
> >
> >
> >
> > On Tue, Mar 24, 2020 at 7:03 AM Szalay-Bekő Máté <
> > szalay.beko.m...@gmail.com>
> > wrote:
> >
> > > Hi Kuldeep,
> > >
> > > I just want to provide you some background info about our
> documentation.
> > > The reason to upgrade to 3.4.6 first is to avoid the following error:
> > >
> > > > 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784
> > > :QuorumCnxManager@349] - Invalid server id: -65536
> > >
> > > This error comes because of the protocol changes between ZooKeeper
> server
> > > nodes during connection initiation for leader election. In ZooKeeper
> 3.5
> > a
> > > protocol version was introduced (see ZOOKEEPER-107) and since that time
> > the
> > > fist long value sent in the initial message is not the server ID but
> the
> > > protocol version (-65536). In ZooKeeper 3.4.6 we made the old 3.4
> > > ZooKeepers backward compatible, so they are able to parse both the old
> > and
> > > the new protocol format (see ZOOKEEPER-1633). This issue happens only
> > when
> > > you need to use old (3.4.0 - 3.4.5) and new (3.5.0+) ZooKeeper servers
> > > together in the same cluster. During a rolling upgrade, this is usually
> > the
> > > case to have old and new ZooKeepers present together.
> > >
> > > The fact that you haven't seen any issues might be caused by the order
> of
> > > the servers. In ZooKeeper the connection initiation between the servers
> > > during the leader election follows a specific rule. As far as I
> remember
> > > always the server with the larger ID 'wins the challenge', so it is
> > > possible,

Re: [ANNOUNCE] New ZooKeeper committer: Mate Szalay-Beko

2020-04-03 Thread Szalay-Bekő Máté

Thank you all! :)
And thanks for the trust!

Mate

On Fri, Apr 3, 2020 at 10:58 AM Enrico Olivelli  wrote:

> Kudos Mate!
>
> Welcome aboard
>
> Enrico
>
> Il Ven 3 Apr 2020, 10:46 Norbert Kalmar  ha
> scritto:
>
> > Congratulations Máté, well deserved! :)
> >
> > - Norbert
> >
> > On Fri, Apr 3, 2020 at 10:42 AM Andor Molnar  wrote:
> >
> > > The Apache ZooKeeper PMC recently extended committer karma to Mate and
> he
> > > has accepted.
> > > Mate has made some great contributions (including C client!) and we are
> > > looking forward to even more. :)
> > >
> > > Congratulations and welcome aboard, Mate!
> > >
> > >
> > >
> >
>

Re: upgrade from 3.4.5 to 3.5.6

2020-03-24 Thread Szalay-Bekő Máté

Hi Kuldeep,

I just want to provide you some background info about our documentation.
The reason to upgrade to 3.4.6 first is to avoid the following error:

> 2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784
:QuorumCnxManager@349] - Invalid server id: -65536

This error comes because of the protocol changes between ZooKeeper server
nodes during connection initiation for leader election. In ZooKeeper 3.5 a
protocol version was introduced (see ZOOKEEPER-107) and since that time the
fist long value sent in the initial message is not the server ID but the
protocol version (-65536). In ZooKeeper 3.4.6 we made the old 3.4
ZooKeepers backward compatible, so they are able to parse both the old and
the new protocol format (see ZOOKEEPER-1633). This issue happens only when
you need to use old (3.4.0 - 3.4.5) and new (3.5.0+) ZooKeeper servers
together in the same cluster. During a rolling upgrade, this is usually the
case to have old and new ZooKeepers present together.

The fact that you haven't seen any issues might be caused by the order of
the servers. In ZooKeeper the connection initiation between the servers
during the leader election follows a specific rule. As far as I remember
always the server with the larger ID 'wins the challenge', so it is
possible, that the old server didn't need to parse any initial message (if
it had the largest ID) and this is why you haven't seen the issue. Also
having 2 nodes up from the 3 nodes cluster still makes the cluster work (so
you should also check if all the servers are part of the quorum).

I agree with Enrico and Norbert, the safest and most stable way is upgrade
first to 3.4.latest, then go to 3.5.latest. Still, if you don't see that
you would hit this specific issue (e.g. no "Invalid server id" in the log
files), and all the three servers can handle traffic, then maybe you don't
need to upgrade first to 3.4.latest, it is your decision. Definitely you
should test it first, as suggested by the others.

Kind regards,
Mate

On Tue, Mar 24, 2020 at 12:29 PM Norbert Kalmar
 wrote:

> Hi,
>
> That guide is to upgrade to 3.5.0, which was an alpha version. A lot has
> changed for the first stable release of 3.5.5 and then a few more, even
> rolling upgrade issues have been fixed for 3.5.6.
> This is a more up-to-date guide:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
>
> If you have done your testing (with prod snapshot!), then you can skip 3.4
> latest upgrade, but keep in mind we do our recommendations for a reason.
> There were issues reported and/or found during testing. Some are fixed with
> 3.5.6, some only happens if certain conditions stand (IOException: No
> snapshot found - mentioned in the guide, fixed in 3.5.6).
>
> So it is up to you, I would still recommend to do an 3.4 upgrade first, if
> it's feasible.
>
> Regards,
> Norbert
>
> On Tue, Mar 24, 2020 at 11:45 AM kuldeep singh 
> wrote:
>
> > Hi,
> >
> > Current Zookeeper version :- 3.4.5
> > Upgraded version:- 3.5.6
> >
> > We are not going with 3.5.7. Our final decision is zookeeper version is
> > 3.5.6
> > as per your reply first we need to move latest version of 3.4.x, like
> below
> >
> > 3.4.5 -> 3.4.14 -> 3.5.6 (Correct me if I am wrong here)
> >
> > But if We are not facing any problem that i have shared you that we have
> > set up of 3 node cluster where 2 node are on 3.5.6 version and 1 node on
> > 3.4.5, Everything is running fine and didn't get any issue, So what other
> > problem we can face if we directly move to 3.5.6
> >
> > Thanks,
> > -
> > Kuldeep Singh Budania
> > Software Architect
> >
> >
> > On Tue, Mar 24, 2020 at 3:58 PM Enrico Olivelli 
> > wrote:
> >
> > > Hi
> > > You have to upgrade to latest 3.4.x Zookeeper then you will upgrade to
> > > 3.5.7.
> > > All should run well without issues
> > >
> > >
> > > Enrico
> > >
> > > Il Mar 24 Mar 2020, 10:18 kuldeep singh  ha
> > > scritto:
> > >
> > > > Hi Team,
> > > >
> > > > We are upgrading zookeeper from 3.4.5 to 3.5.6. I have set up 3 node
> > > > cluster where 2 node are on 3.5.6 version and 1 node on 3.4.5.
> > > >
> > > > Everything is running fine and didn't get any issue on my system.
> > > >
> > > > but I found something on apache site  that first we need to upgrade
> on
> > > > 3.4.6 than we can upgrade to 3.5.6. So is it mandatory  to go on
> 3.4.6
> > > > first.
> > > >
> > > > *Upgrading to 3.5.0*
> > > >
> > > > Upgrading a running ZooKeeper ensemble to 3.5.0 should be done only
> > after
> > > > upgrading your ensemble to the 3.4.6 release. Note that this is only
> > > > necessary for rolling upgrades (if you're fine with shutting down the
> > > > system completely, you don't have to go through 3.4.6). If you
> attempt
> > a
> > > > rolling upgrade without going through 3.4.6 (for example from 3.4.5),
> > you
> > > > may get the following error:
> > > >
> > > > 2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/127.0.0.1:2784
> > > >

Re: Zookeeper not listening on 2888 and appears nodes are not connecting to each other.

2020-03-24 Thread Szalay-Bekő Máté

There are multiple issues around your setup which got fixed recently.

The IllegalArgumentException at java.util.concurrent.ThreadPoolExecutor
suggest you were hitting this issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-3758
This affects ZooKeeper 3.6.0 and we already fixed it, as Enrico mentioned.
The 3.6.1 will solve this particular issue. Or you can also set the
following config as a workaround:
multiAddress.reachabilityCheckEnabled=false. (setting this won't be needed
in 3.6.1, but in your case it will most probably help in 3.6.0) This is a
3.6 specific issue and 3.6 specific configuration parameter, you won't see
this problem in 3.5.

However... you also using 0.0.0.0 in you server config, which is actually
not recommended since 3.5. This leads to other error when peers wish to
rejoin to the quorum (see
https://issues.apache.org/jira/browse/ZOOKEEPER-2164). This was also fixed
and the fix will be released in 3.5.8 and 3.6.1. As a workaround (and this
is actually not only a workaround but the more consistent and dynamic
re-config compatible way) you can use the following config in all the there
servers:

quorumListenOnAllIPs=true
server.1=< fqdn of server 1 >:2888:3888
server.2=< fqdn of server 2 >:2888:3888
server.3=< fqdn of server 3 >:2888:3888

The 'quorumListenOnAllIPs' config above will have the same effect: all the
servers will listen on 0.0.0.0 locally. But it has a benefit that all the
members still have the same view of the cluster. And the re-join problem
should not happen here.

 I hope these changes will help.

Kind regards,
Mate

On Tue, Mar 24, 2020 at 2:06 AM rld244  wrote:

> Thanks for getting back to me Enrico.
>
> I'm working within an AWS VPC with all nodes on a private subnet and I
> think
> Ping isn't enabled, but I can use nc and specific ports to verify that the
> servers can talk to each other.
>
> Interestingly I just noticed that on the node with myid 3 zookeeper is
> listening on 2888 and can be reached by the other nodes on 2181, 2888 and
> 3888.
>
> I don't know if that helps.
>
> Maybe I'll try deploying 3.5.x instead.
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: systemd failed to stop zookeeper-server.

2020-03-05 Thread Szalay-Bekő Máté

Hi Tasos,

Thanks for reporting the issue!

Taking a quick look on it, I don't think it is related to ZooKeeper itself,
rather it is something related to systemd and to the init scripts / service
descriptors on CentOS. We don't maintain those parts. I actually never used
systemd in production to run ZooKeeper as a service, but use custom
in-house management scripts to start / stop / monitor the ZooKeeper
process.  But who knows, maybe someone has some experience with this in the
ZooKeeper community too...

If no one reply you on this list, then maybe someone in the CentOS
community or in BigTop can help you to fix the issue.

Kind regards,
Mate

On Wed, Mar 4, 2020 at 3:44 PM Anastasios Lisgaras
 wrote:

> Hello zookeeper community,
>
> After the last system update I had a problem with the zookeeper-server.
> The problem was about the same as this:
> https://issues.apache.org/jira/browse/BIGTOP-3302
> So, I went to the file :
> /etc/init.d/zookeeper-server
>
> and I replaced "su" with "runuser". If you want to see it :
>
> # cat /etc/init.d/zookeeper-server : https://termbin.com/gupi
> # cat /usr/lib/zookeeper/bin/zkServer.sh : https://termbin.com/3xpb
>
> After this change the server runs flawlessly, but when i try to stop it
> via "systemctl" the system (systemd) complains.
>
> ```
> # systemctl status zookeeper-server
> ● zookeeper-server.service - LSB: ZooKeeper is a centralized service for
> maintaining configuration information, naming, providing distributed
> synchronization, and providing group services.
>Loaded: loaded (/etc/rc.d/init.d/zookeeper-server; bad; vendor
> preset: disabled)
>Active: active (running) since Wed 2020-03-04 16:16:51 EET; 2s ago
>  Docs: man:systemd-sysv-generator(8)
>   Process: 30509 ExecStart=/etc/rc.d/init.d/zookeeper-server start
> (code=exited, status=0/SUCCESS)
>  Main PID: 30557 (java)
>CGroup: /system.slice/zookeeper-server.service
>└─30557 /usr/lib/jvm/jre-openjdk/bin/java
> -Dzookeeper.datadir.autocreate=false
> -Dzookeeper.log.dir=/var/log/zookeeper -Dzookee...
>
> Mar 04 16:16:50 myserver systemd[1]: Starting LSB: ZooKeeper is a
> centralized service for maintaining configuration information,...ices
> Mar 04 16:16:50 myserver runuser[30541]: pam_unix(runuser:session):
> session opened for user zookeeper by (uid=0)
> Mar 04 16:16:50 myserver zookeeper-server[30509]: JMX enabled by default
> Mar 04 16:16:50 myserver zookeeper-server[30509]: Using config:
> /etc/zookeeper/conf/zoo.cfg
> Mar 04 16:16:51 myserver zookeeper-server[30509]: Starting zookeeper ...
> STARTED
> Mar 04 16:16:51 myserver runuser[30541]: pam_unix(runuser:session):
> session closed for user zookeeper
> Mar 04 16:16:51 myserver systemd[1]: Started LSB: ZooKeeper is a
> centralized service for maintaining configuration information, ...rvices..
> Hint: Some lines were ellipsized, use -l to show in full.
>
>
> # systemctl stop zookeeper-server
>
>
> # systemctl status zookeeper-server -l
> ● zookeeper-server.service - LSB: ZooKeeper is a centralized service for
> maintaining configuration information, naming, providing distributed
> synchronization, and providing group services.
>Loaded: loaded (/etc/rc.d/init.d/zookeeper-server; bad; vendor
> preset: disabled)
>Active: failed (Result: signal) since Wed 2020-03-04 16:17:00 EET; 4s
> ago
>  Docs: man:systemd-sysv-generator(8)
>   Process: 30605 ExecStop=/etc/rc.d/init.d/zookeeper-server stop
> (code=exited, status=0/SUCCESS)
>   Process: 30509 ExecStart=/etc/rc.d/init.d/zookeeper-server start
> (code=exited, status=0/SUCCESS)
>  Main PID: 30557 (code=killed, signal=KILL)
>
> Mar 04 16:16:51 myserver systemd[1]: Started LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services..
> Mar 04 16:17:00 myserver systemd[1]: Stopping LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services
> Mar 04 16:17:00 myserver runuser[30642]: pam_unix(runuser:session):
> session opened for user zookeeper by (uid=0)
> Mar 04 16:17:00 myserver zookeeper-server[30605]: JMX enabled by default
> Mar 04 16:17:00 myserver zookeeper-server[30605]: Using config:
> /etc/zookeeper/conf/zoo.cfg
> Mar 04 16:17:00 myserver zookeeper-server[30605]: Stopping zookeeper ...
> STOPPED
> Mar 04 16:17:00 myserver systemd[1]: zookeeper-server.service: main
> process exited, code=killed, status=9/KILL
> Mar 04 16:17:00 myserver systemd[1]: Stopped LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services..
> Mar 04 16:17:00 myserver systemd[1]: Unit zookeeper-server.service
> entered failed state.
> Mar 04 16:17:00 myserver systemd[1]: zookeeper-server.service failed.
>
>
> # systemctl is-active zookeeper-server
> failed
>
>
> #

Re: [ANNOUNCE] Apache Curator 4.3.0 released

2020-02-29 Thread Szalay-Bekő Máté

Great news! :)

On Sat, Feb 29, 2020, 14:30 Jordan Zimmerman 
wrote:

> Thanks Cameron!
>
> -Jordan
>
> > On Feb 29, 2020, at 3:59 AM, Cameron McKenzie 
> wrote:
> >
> > Hello,
> >
> > The Apache Curator team is pleased to announce the  release of version
> > 4.3.0. Apache  Curator is a Java/JVM client library for Apache
> > ZooKeeper[1], a distributed coordination service. Apache Curator
> includes a
> > high-level API framework and utilities to make using Apache ZooKeeper
> much
> > easier and more reliable. It also includes recipes for common use cases
> and
> > extensions such as service discovery and a Java 8 asynchronous DSL. For
> > more details, please visit the project website:
> http://curator.apache.org/
> >
> > The download page for Apache Curator is here:
> > https://cwiki.apache.org/confluence/display/CURATOR/Releases
> >
> > The binary artifacts for Curator are available from Maven Central and its
> > mirrors.
> >
> > For general information on Apache Curator, please visit the project
> website:
> > http://curator.apache.org
> >
> > Release Notes - Apache Curator - Version 4.3.0
> >
> > ** Bug
> >* [CURATOR-530] - Documentation on InterProcessSemaphoreMutex is
> > misleading
> >* [CURATOR-541] - Infinite loop/repeat in BaseClassForTests
> >* [CURATOR-543] - ZOOKEEPER-1392 broke TestLockACLs
> >* [CURATOR-546] - currentData in ModeledCacheImpl removes ZPath from
> > cache entries
> >* [CURATOR-551] - Curator client always call updateServerList with
> > original serverList value, not the ones updated by EnsembleTracker
> >* [CURATOR-559] - Inconsistent ZK timeouts
> >
> > ** Improvement
> >* [CURATOR-511] - Add toString to ZKPaths PathAndNode
> >* [CURATOR-517] - Remove unused class AdvancedTracerDriver
> >* [CURATOR-533] - Improve CURATOR-505 by making the CircuitBreaker
> > instance shared
> >* [CURATOR-537] - Expose ourPath in LeaderLatch readonly
> >* [CURATOR-547] - Make JAX-RS MessageBodyReader/-Writer impl
> > (JsonServiceInstanceMarshaller) reuse Jackson ObjectMapper
> >* [CURATOR-548] - Bump zookeeper version to 3.5.7
> >* [CURATOR-556] - Fix typo
> >* [CURATOR-560] - Make sure tickTime and minSessionTimeout are set
> >
> > ** Wish
> >* [CURATOR-519] - Curator 4.0.x/4.1.x release using Zookeeper 3.5.5
> >
> > ** Task
> >* [CURATOR-539] - Remove link to obsolete Stack Overflow tag
> >
> > Regards,
> >
> > The Curator Team
> >
> > [1] Apache ZooKeeper https://zookeeper.apache.org/
>
>

Re: Zookeeper won't form a quorum...

2020-02-20 Thread Szalay-Bekő Máté

Hi Steve,

This is great news!
I also found the same during my tests, it is good to see that it works in
your environment as well.
Thanks for taking the time on testing and also sharing your findings!

Still, I think it worths to push the PRs in ZOOKEEPER-2164 just for
backward-compatibility reasons, so that not everyone affected by this needs
to change their configs when upgrading from 3.4 to 3.5.

Kind regards,
Mate

On Thu, Feb 20, 2020 at 4:40 PM Steve Jerman  wrote:

> Hi Folks,
>
> OK, the following works. It will start up and if any of the three are
> restarted they will rejoin the quorum.
>
> 1) Versions:
> openjdk version "1.8.0_111-internal"
> OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
> Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
> on 02/10/2020 11:30 GMT
>
> 2) Add following to zoo.cfg:
> quorumListenOnAllIPs=true
>
> 3) Use the follow docker-compose config:
> environment:
>   ZK_ID: 1
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 2
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 3
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>
>
> Sorry, I misunderstood yesterday, I did 2) but not 3)
>
> Thanks.
>
> Steve
>
>
> On 2/20/20, 8:16 AM, "Steve Jerman"  wrote:
>
> Few points:
>
> 1) The containers all are running:
>
> bash-4.3# java -version
> openjdk version "1.8.0_111-internal"
> OpenJDK Runtime Environment (build 1.8.0_111-internal-alpine-r0-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>
> 2) The containers are configured like this:
> environment:
>   ZK_ID: 1
>   ZK_CLUSTER: server.1=0.0.0.0:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 2
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888 server.2=0.0.0.0:2888:3888
> server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 3
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=0.0.0.0:2888:3888
>
> Reading below, I see that you suggest I should do the following:
>
> environment:
>   ZK_ID: 1
>  quorumListenOnAllIPs: true
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 2
>  quorumListenOnAllIPs: true
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
> --
> environment:
>   ZK_ID: 3
>  quorumListenOnAllIPs: true
>   ZK_CLUSTER: server.1=zookeeper1:2888:3888
> server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
>
> Will try...
>
> Steve
>
> On 2/20/20, 2:56 AM, "Jörn Franke"  wrote:
>
> Thanks . It is strange that JDK 11.0.6 has a backwards
> incompatible change. However, it would be sad if we are stuck all the time
> with JDK 11.0.5.
>
>
> > Am 20.02.2020 um 10:53 schrieb Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>:
> >
> > Hi Guys,
> >
> > I think the 'reverse order startup failure' actually has the
> very same root
> > cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.
> >
> > Downgrading to 3.4 for now should solve these problems I think.
> >
> > Still I am a bit confused... I just want to understand if we
> really miss
> > something in the ZooKeeper configuration model.
> >
> > Assuming that myid=1 (we are talking about the zoo.cfg in server
> 1), to the
> > 'server.1=...' line you can put address which can be used by
> other servers
> > to talk back to server 1. This will be the 'advertised address'
> used by
> > ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5
> (until we fix
> > it with ZOOKEEPER-2164), as server 2 will not be able to use
> 0.0.0.0 to
> > talk back to server 1. But if you put a valid address to the
> 'server.1=...'
> > config line while having quorumListenOnAllIPs=true set, you
> should s

Re: Zookeeper won't form a quorum...

2020-02-20 Thread Szalay-Bekő Máté

Hi Guys,

I think the 'reverse order startup failure' actually has the very same root
cause than the 0.0.0.0 issue discussed in ZOOKEEPER-2164.

Downgrading to 3.4 for now should solve these problems I think.

Still I am a bit confused... I just want to understand if we really miss
something in the ZooKeeper configuration model.

Assuming that myid=1 (we are talking about the zoo.cfg in server 1), to the
'server.1=...' line you can put address which can be used by other servers
to talk back to server 1. This will be the 'advertised address' used by
ZooKeeper in 3.5. Putting here 0.0.0.0 will not work with 3.5 (until we fix
it with ZOOKEEPER-2164), as server 2 will not be able to use 0.0.0.0 to
talk back to server 1. But if you put a valid address to the 'server.1=...'
config line while having quorumListenOnAllIPs=true set, you should still be
able tell ZooKeeper to bind on 0.0.0.0, no matter what IP/hostname you put
to the 'server.1=...' configs.
(there is a similar config to set which IP to bind with the client port as
well, if you would need to bind to 0.0.0.0 with the client port too.)

@Jorn:
>This might be a wide shot and I did not see exactly the same error, but
with corretto jdk 11.0.6 I had also issue that ZK could not a quorum. I
downgraded to 11.0.5 and it did not have an issues. This was on ZK 3.5.5
with Kerberos authentication and authorization.

In the recent JDK versions (8u424, or 11.0.6) there are some
backward-incompatible kerbersos related changes affecting basically the
whole Hadoop stack, not only ZooKeeper. I think it is not recommended to
use these JDK versions with Hadoop. I am not involved deep in this (maybe
there is some workaround already, but I am not aware of it).

Kind regards,
Mate

On Wed, Feb 19, 2020 at 11:49 PM Steve Jerman  wrote:

> Ok,
>
> Just to confirm. Rolling back to 3.4.14 fixes the issue.  The quorum
> starts up and restarting any of the instances works
>
> Are there any issues with using the 3.5 client with 3.4 server?
>
> Steve
>
> On 2/19/20, 9:02 AM, "Steve Jerman"  wrote:
>
> OK, that explains it.  I will see if 3.4.14 fixes the issue for the
> time being...
>
> Thanks
> Steve
>
> On 2/19/20, 8:57 AM, "Jan Kosecki"  wrote:
>
> Hi Steve,
>
> it's possible that the quorum state depends on the order your
> nodes start.
> In my kubernetes environment I've had a similar issue and I've
> noticed that
> starting brokers 1 by 1, following the order from configuration
> file allows
> all 3 to join the quorum but a reverse order would keep server
> started as
> the last outside of the quorum. I was also using 0.0.0.0 in the
> configuration and didn't try a full address due to readiness check
> configuration.
>
> Unfortunately I didn't have time to debug it any further so I've
> downgraded
> back to 3.4 for the time being.
>
> Hope you manage to find a solution,
>
> Best,
> Jan
>
> On Wed, 19 Feb 2020, 15:47 Steve Jerman, 
> wrote:
>
> > Hi,
> >
> > I've just been testing restarts ... I restarted one of the
> instances (id
> > 1) ... and it doesn't join the quorum ... same error.
> >
> > Odd that the system started fine but can't handle a restart
> >
> > Steve
> >
> > On 2/19/20, 7:45 AM, "Steve Jerman"  wrote:
> >
> > Thank You Mate,
> >
> > That fixed it. Unfortunately I can't easily avoid using
> 0.0.0.0
> >
> > My configuration is built using Docker Storm and it doesn't
> let you
> > bind to a host name...
> >
> > Steve
> >
> > On 2/19/20, 5:27 AM, "Szalay-Bekő Máté" <
> szalay.beko.m...@gmail.com>
> > wrote:
> >
> > Hi Steve!
> >
> > If you are using ZooKeeper newer than 3.5.0, then this
> might be
> > the issue
> > we are just discussing / trying to fix in ZOOKEEPER-2164.
> > Can you test your setup with a config where you don't
> use 0.0.0.0
> > in the
> > server addresses?
> >
> > If you need to bind to the 0.0.0.0 locally, then please
> set the
> > 'quorumListenOnAllIPs' config property to true.
> >
> > like:
> >
> > # usually you don't really need this, unless if you
> actually need
> > to

Re: Zookeeper won't form a quorum...

2020-02-19 Thread Szalay-Bekő Máté

Hi Steve!

If you are using ZooKeeper newer than 3.5.0, then this might be the issue
we are just discussing / trying to fix in ZOOKEEPER-2164.
Can you test your setup with a config where you don't use 0.0.0.0 in the
server addresses?

If you need to bind to the 0.0.0.0 locally, then please set the
'quorumListenOnAllIPs' config property to true.

like:

# usually you don't really need this, unless if you actually need to bind
to multiple IPs
quorumListenOnAllIPs=true

# it is best if all the zoo.cfg files contain the same address settings,
and we don't use 0.0.0.0 here
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

Kind regards,
Mate

On Wed, Feb 19, 2020 at 6:08 AM Steve Jerman  wrote:

> Hello folks,
>
> Wonder if anyone can help me. Suspect it must be something simple but I
> cant see it. Any suggestions about how to diagnose would be gratefully
> received.
>
> I have a three node ZK cluster, when it starts up only two of  the nodes
> form a quorum. If I restart the leader the quorum reforms with the other
> two node…
>
> Thanks in advance for any  help
> Steve
>
> This is the ‘stat’ for the leader/follow…
>
> bash-5.0$ echo stat | nc zookeeper1 2181
> This ZooKeeper instance is not currently serving requests
>
> bash-5.0$ echo stat | nc zookeeper2 2181
> Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
> on 02/10/2020 11:30 GMT
> Clients:
> /10.0.1.152:44910[1](queued=0,recved=151,sent=151)
> /10.0.1.140:53138[1](queued=0,recved=187,sent=187)
> /10.0.1.143:57422[1](queued=0,recved=151,sent=151)
> /10.0.1.152:59242[0](queued=0,recved=1,sent=0)
> /10.0.1.143:40826[1](queued=0,recved=1139,sent=1139)
> /10.0.1.152:49188[1](queued=0,recved=200,sent=203)
> /10.0.1.152:59548[1](queued=0,recved=1157,sent=1159)
> /10.0.1.140:36624[1](queued=0,recved=151,sent=151)
>
> Latency min/avg/max: 0/0/5
> Received: 3338
> Sent: 3342
> Connections: 8
> Outstanding: 0
> Zxid: 0xc00f3
> Mode: follower
> Node count: 181
>
> bash-5.0$ echo stat | nc zookeeper3 2181
> Zookeeper version: 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built
> on 02/10/2020 11:30 GMT
> Clients:
> /10.0.1.152:49428[0](queued=0,recved=1,sent=0)
> /10.0.1.140:32912[1](queued=0,recved=1426,sent=1429)
>
> Latency min/avg/max: 0/0/4
> Received: 1684
> Sent: 1686
> Connections: 2
> Outstanding: 0
> Zxid: 0xc00f4
> Mode: leader
> Node count: 181
> Proposal sizes last/min/max: 32/32/406
> bash-5.0$
>
> The trace for the failing node is:
>
> server.1=0.0.0.0:2888:3888
> server.2=zookeeper2:2888:3888
> server.3=zookeeper3:2888:3888
> ZooKeeper JMX enabled by default
> Using config: /opt/zookeeper/bin/../conf/zoo.cfg
> 2020-02-19 04:23:27,759 [myid:] - INFO  [main:QuorumPeerConfig@135] -
> Reading configuration from: /opt/zookeeper/bin/../conf/zoo.cfg
> 2020-02-19 04:23:27,764 [myid:] - INFO  [main:QuorumPeerConfig@387] -
> clientPortAddress is 0.0.0.0:2181
> 2020-02-19 04:23:27,764 [myid:] - INFO  [main:QuorumPeerConfig@391] -
> secureClientPort is not set
> 2020-02-19 04:23:27,771 [myid:1] - INFO  [main:DatadirCleanupManager@78]
> - autopurge.snapRetainCount set to 3
> 2020-02-19 04:23:27,772 [myid:1] - INFO  [main:DatadirCleanupManager@79]
> - autopurge.purgeInterval set to 24
> 2020-02-19 04:23:27,772 [myid:1] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
> 2020-02-19 04:23:27,773 [myid:1] - INFO  [main:ManagedUtil@46] - Log4j
> found with jmx enabled.
> 2020-02-19 04:23:27,774 [myid:1] - INFO  [PurgeTask:FileTxnSnapLog@115] -
> zookeeper.snapshot.trust.empty : false
> 2020-02-19 04:23:27,780 [myid:1] - INFO
> [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
> 2020-02-19 04:23:27,781 [myid:1] - INFO  [main:QuorumPeerMain@141] -
> Starting quorum peer
> 2020-02-19 04:23:27,786 [myid:1] - INFO  [main:ServerCnxnFactory@135] -
> Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection
> factory
> 2020-02-19 04:23:27,788 [myid:1] - INFO  [main:NIOServerCnxnFactory@673]
> - Configuring NIO connection handler with 10s sessionless connection
> timeout, 2 selector thread(s), 32 worker threads, and 64 kB direct buffers.
> 2020-02-19 04:23:27,791 [myid:1] - INFO  [main:NIOServerCnxnFactory@686]
> - binding to port 0.0.0.0/0.0.0.0:2181
> 2020-02-19  04:23:27,809 [myid:1]
> - INFO  [main:Log@169] - Logging initialized @249ms to
> org.eclipse.jetty.util.log.Slf4jLog
> 2020-02-19 04:23:27,913 [myid:1] - WARN  [main:ContextHandler@1520] -
> o.e.j.s.ServletContextHandler@5abca1e0{/,null,UNAVAILABLE} contextPath
> ends with /*
> 2020-02-19 04:23:27,913 [myid:1] - WARN  [main:ContextHandler@1531] -
> Empty contextPath
> 2020-02-19 04:23:27,922 [myid:1] - INFO  [main:X509Util@79] - Setting -D
> jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated
> TLS renegotiation
> 2020-02-19 04:23:27,923 [myid:1] - INFO  [main:FileTxnSnapLog@115] -
>

Re: Enabling Auth between Zookeeper Servers

2020-02-18 Thread Szalay-Bekő Máté

congrats! :)

> @Mate: as I copied the jaas.conf from your repo is that the exact file
you used for testing? Because changing the "user_zookeeper" to "user_kafka"
in the server-part fixed it.

if you mean this file, then yes, I used this for testing:
https://github.com/symat/zookeeper-docker-test/blob/master/conf/digest_jaas.conf
and it worked for me... strange that in your case you had to change it.

Regarding the usefulness of the error message: I am not sure if we can
change that, it should come from a java system library. ZooKeeper just
catch the SecurityException, and doesn't really analyze its content.
Still, it would be great at least to at least print out the exact security
exception (at least with debug logging) and also update the wiki /
documentation and highlighting that different user names may need to be
used.
If you create such a ticket, please also add the exact java version you
used in the docker image.

Good luck for the Kafka work! :)
Mate

On Mon, Feb 17, 2020 at 8:40 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello,
>
> I think I found the issue...
>
> One can't use the same username for clients and quorums. I configured
> all of them to be "zookeeper", but in the server-part of the jaas.conf
> it should probably be more like "kafka" as it's Kafka which
> authenticates to the zookeeper in that case and zookeepers are using the
> qorum-part to authenticate to each other. Correct?
> If that's correct the exception message is completely wrong. It can find
> the file, it can read the file and it even finds the server-part, but
> the server-part itself has wrong configuration.
>
> At least with the hanged username in the server-part I got a new exception:
>
> 2020-02-17 19:28:17,994 [myid:1] - ERROR [main:ZooKeeperServerMain@83] -
> Unexpected exception, exiting abnormally
> java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!
>
> Which was probably caused by non-cleaned folders of some previous
> deployments. So I added the "snapshot.trust.empty=true" to the config to
> have it start and rebuild the snapshot. And now my zookeeper is running
> just fine! :)
>
> @Mate: as I copied the jaas.conf from your repo is that the exact file
> you used for testing? Because changing the "user_zookeeper" to
> "user_kafka" in the server-part fixed it.
>
> My next task now is to get Kafka authenticated to zookeeper and get ACLs
> working. Will be fun :)
> And I should probably create a ticket to get the jaas.conf-error message
> fixed!?
>
> Best regards
>
> Sebastian
>
>
> On 17-Feb-20 1:50 PM, Sebastian Schmitz wrote:
> > Hey,
> >
> > I also just tried using 3.5.7, but same problem...
> >
> > Best regards
> >
> > Sebastian
> >
> >
> > On 17-Feb-20 11:34 AM, Sebastian Schmitz wrote:
> >> Hi Mate,
> >>
> >> that's what I also tried. I copied it to the
> >> /opt/zookeeper-cluster/-folder and got the same exception just with
> >> the new path.
> >>
> >> So, if that config works on your side it might be my environment
> >> then!? Maybe it's a problem with the base-image
> >> openjdk:11-jre-stretch which I use for the container... I'll try
> >> using the openjdk:8u222-jre you're using.
> >>
> >> Best regards
> >>
> >> Sebastian
> >>
> >>
> >> On 17-Feb-20 9:19 AM, Szalay-Bekő Máté wrote:
> >>> Hi Sebastian,
> >>>
> >>> It's strange indeed... I also see the owner is root. That should
> >>> work in
> >>> docker usually, given that you run the zookeeper process with the root
> >>> user. Maybe copying it to a different folder? I see that the conf
> >>> folder
> >>> has different owner, maybe the java security library doesn't like that?
> >>>
> >>> But honestly, I don't have any useful explanation.
> >>>
> >>> Good luck!
> >>> Mate
> >>>
> >>> On Sun, Feb 16, 2020, 20:06 Sebastian Schmitz <
> >>> sebastian.schm...@propellerhead.co.nz> wrote:
> >>>
> >>>> Hey Mate,
> >>>>
> >>>> now it gets really weird. I get the file not found exception:
> >>>>
> >>>> '.20-02-16 18:27:50,530 [myid:1] - ERROR
> >>>> [main:ServerCnxnFactory@246] -
> >>>> No JAAS configuration section named 'Server' was found in
> >>>> '/opt/zookeeper-cluster/zookeeper/conf/jaas.conf
> >>>> java.lang.SecurityException

Re: Enabling Auth between Zookeeper Servers

2020-02-16 Thread Szalay-Bekő Máté

orumServer {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_zookeeper="test";
> };
> QuorumClient {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="zookeeper"
> password="test";
> };
> Server {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_zookeeper="test";
> };
> Client {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="zookeeper"
> password="test";
> };
>
> The weird part now is that the access is set exactly the same as the
> zoo.cfg which it can read without problems.
>
> Also changing the access to 666 doesn't change anything. And using your
> config doesn't help either:
>
> jaas.conf:
> QuorumServer {
>  org.apache.zookeeper.server.auth.DigestLoginModule required
>  user_zookeeper="test";
> };
> QuorumLearner {
>  org.apache.zookeeper.server.auth.DigestLoginModule required
>  username="zookeeper"
>  password="test";
> };
> Server {
>  org.apache.zookeeper.server.auth.DigestLoginModule required
>  user_zookeeper="test";
> };
>
> zoo.cfg:
> tickTime=2000
> initLimit=10
> syncLimit=5
>
> dataDir=/mnt/zk_data
>
> clientPort=2181
>
> standaloneEnabled=true
> admin.enableServer=true
> localSessionsEnabled=true
> localSessionsUpgradingEnabled=true
>
> 4lw.commands.whitelist=stat, ruok, conf, isro, wchc, wchp, srvr, mntr, cons
>
> clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
>
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> quorum.auth.enableSasl=true
> quorum.auth.learnerRequireSasl=false
> quorum.auth.serverRequireSasl=false
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.server.saslLoginContext=QuorumServer
> dataLogDir=/mnt/zk_data_log
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=24
> quorum.cnxn.threads.size=20
> server.1=0.0.0.0:2888:3888
>
> I have no idea what's different now. I'll try to run the stuff from your
> repo and see if that works.
>
> Best regards
>
> Sebastian
>
> On 14-Feb-20 8:11 PM, Szalay-Bekő Máté wrote:
> > Hi Sebastian!
> >
> > I was able to setup digest authentication, uploaded my results here:
> > https://github.com/symat/zookeeper-docker-test
> > You can see my docker compose file:
> >
> https://github.com/symat/zookeeper-docker-test/blob/master/3_nodes_digest_quorum_auth.yml
> > also the zoo.cfg template:
> >
> https://github.com/symat/zookeeper-docker-test/blob/master/conf/digest_zoo.cfg
> > and the jaas.cfg file:
> >
> https://github.com/symat/zookeeper-docker-test/blob/master/conf/digest_jaas.conf
> >
> > It works for me, using ZooKeeper 3.5.6. Although I haven't follow your
> > config everywhere.
> >
> > Still, I wasn't able to reproduce your exception, only when I actually
> > deleted the jaas config file. Are you sure that the ZooKeeper process in
> > docker can see / open that file?
> >
> > I created a patched ZooKeeper 3.5.6 for you (you can download from here:
> > https://drive.google.com/open?id=1KEPjNkiKf937jMJHAicwW9WATEuyRZIo),
> where
> > more details are printed in case of errors. E.g. in my case when I
> deleted
> > the jaas config file, I get:
> >
> > zoo1_1  | 2020-02-14 07:04:33,288 [myid:1] - ERROR
> > [main:ServerCnxnFactory@246] - No JAAS configuration section named
> 'Server'
> > was found in '/scripts/conf/digest_jaas.conf'.
> > zoo1_1  | java.lang.SecurityException: java.io.IOException:
> > /scripts/conf/digest_jaas.conf (No such file or directory)
> > zoo1_1  |   at
> > sun.security.provider.ConfigFile$Spi.(ConfigFile.java:137)
> > zoo1_1  |   at
> > sun.security.provider.ConfigFile.(ConfigFile.java:102)
> > zoo1_1  |   at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> > zoo1_1  |   at
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> > zoo1_1  |   at
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > zoo1_1  |   at
> > java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> > zoo1_1  |   at java.lang.Class.newInstance(Class.java:442)
> > zoo1_1  |   at
> > javax.security.auth.login.Configuration$2

Re: Upgrade guide from 3.4.x to 3.5.x?

2020-02-14 Thread Szalay-Bekő Máté

An other page on the wiki you can check:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

Kind regards,
Mate

On Sat, Feb 15, 2020, 05:55 Alexander Shraer  wrote:

> Hi, please see “upgrading to 3.5” section here:
> https://zookeeper.apache.org/doc/r3.5.4-beta/zookeeperReconfig.html
>
> On Fri, Feb 14, 2020 at 8:48 PM shrikant kalani 
> wrote:
>
> > Hi Allen
> >
> > We recently upgrade our Zookeeper clusters from 3.4.13 to 3.5.5.
> >
> > Yes the rolling upgrade are possible and it is backward compatible
> meaning
> > zkclient running on version 3.4.13 can still interact with zkserver
> 3.5.5.
> >
> > Unless you want to leverage dynamic reconfiguration options , the rest of
> > the configuration are very similar. With new version there are other
> > interesting features like Authentication with Kerberos and TLS , Admin UI
> > which all are optional.
> >
> > Thanks
> > Srikant Kalani
> > Sent from my iPhone
> >
> > > On 15 Feb 2020, at 6:11 AM, allen chan 
> > wrote:
> > >
> > > Hello
> > >
> > > I have been trying to find a guide that describes upgrade process from
> > > 3.4.x to 3.5.x.
> > > I cannot find anything on the main zookeeper page.
> > > What i am looking for are breaking changes, configuration changes,
> > > compatibility matrix, is rolling upgrade ok?
> > >
> > > Thanks
> > > --
> > > Allen Michael Chan
> >
>

Re: Enabling Auth between Zookeeper Servers

2020-02-13 Thread Szalay-Bekő Máté

Hi Sebastian!

I was able to setup digest authentication, uploaded my results here:
https://github.com/symat/zookeeper-docker-test
You can see my docker compose file:
https://github.com/symat/zookeeper-docker-test/blob/master/3_nodes_digest_quorum_auth.yml
also the zoo.cfg template:
https://github.com/symat/zookeeper-docker-test/blob/master/conf/digest_zoo.cfg
and the jaas.cfg file:
https://github.com/symat/zookeeper-docker-test/blob/master/conf/digest_jaas.conf

It works for me, using ZooKeeper 3.5.6. Although I haven't follow your
config everywhere.

Still, I wasn't able to reproduce your exception, only when I actually
deleted the jaas config file. Are you sure that the ZooKeeper process in
docker can see / open that file?

I created a patched ZooKeeper 3.5.6 for you (you can download from here:
https://drive.google.com/open?id=1KEPjNkiKf937jMJHAicwW9WATEuyRZIo), where
more details are printed in case of errors. E.g. in my case when I deleted
the jaas config file, I get:

zoo1_1  | 2020-02-14 07:04:33,288 [myid:1] - ERROR
[main:ServerCnxnFactory@246] - No JAAS configuration section named 'Server'
was found in '/scripts/conf/digest_jaas.conf'.
zoo1_1  | java.lang.SecurityException: java.io.IOException:
/scripts/conf/digest_jaas.conf (No such file or directory)
zoo1_1  |   at
sun.security.provider.ConfigFile$Spi.(ConfigFile.java:137)
zoo1_1  |   at
sun.security.provider.ConfigFile.(ConfigFile.java:102)
zoo1_1  |   at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
zoo1_1  |   at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
zoo1_1  |   at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
zoo1_1  |   at
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
zoo1_1  |   at java.lang.Class.newInstance(Class.java:442)
zoo1_1  |   at
javax.security.auth.login.Configuration$2.run(Configuration.java:255)
zoo1_1  |   at
javax.security.auth.login.Configuration$2.run(Configuration.java:247)
zoo1_1  |   at java.security.AccessController.doPrivileged(Native
Method)
zoo1_1  |   at
javax.security.auth.login.Configuration.getConfiguration(Configuration.java:246)
zoo1_1  |   at
org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:210)
zoo1_1  |   at
org.apache.zookeeper.server.NettyServerCnxnFactory.configure(NettyServerCnxnFactory.java:383)
zoo1_1  |   at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:148)
zoo1_1  |   at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
zoo1_1  |   at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
zoo1_1  | Caused by: java.io.IOException: /scripts/conf/digest_jaas.conf
(No such file or directory)
zoo1_1  |   at
sun.security.provider.ConfigFile$Spi.ioException(ConfigFile.java:666)
zoo1_1  |   at
sun.security.provider.ConfigFile$Spi.init(ConfigFile.java:262)
zoo1_1  |   at
sun.security.provider.ConfigFile$Spi.(ConfigFile.java:135)
zoo1_1  |   ... 15 more
z

Kind regards,
Mate

On Fri, Feb 14, 2020 at 7:12 AM sagar shukla 
wrote:

> O
> Sent from Yahoo Mail on Android
>
>   On Fri, Feb 14, 2020 at 11:02 AM, Szalay-Bekő Máté<
> szalay.beko.m...@gmail.com> wrote:   Hi Sebastian,
>
> > But I still get the same exception.
> at this point I don't know why this happen... Adding the Server section to
> the jaas config should have helped. Unfortunately the exact exception is
> not printed out into the logs, just the error message, so it is hard to
> find out more details.
>
> I will try to reproduce your case with 3.5.6 locally and see if it works. I
> never actually used digest authentication before... we always use kerberos
> in production. If it works, I will share my configs / dockerfiles and send
> you a patched version with more debug info printed out.
>
> > Why would configuring quorum-auth also enable client-server-auth?
> it is not very logical indeed... if I see it right, based on the code once
> you set the java.security.auth.login.config property, then ZooKeeper
> assumes you want to use server-client sasl authentication. I guess the
> quorum-auth feature was added later and they introduced 'enable' config
> property for this, but forget to introduce the same config for the client
> authentication. I also guess most of the people are interested in the
> client authentication and it is rare that someone does't need that but
> needs quorum auth. Still, the current behaviour is not good I think. I will
> submit a jira ticket requesting an improvement here when I will have time,
> but feel free to submit it yourself if you wish.
>
> Kind regards,
> Mate
>
> On Thu, Feb 13, 2020 at 7:41 PM Sebastian Schmitz <
&

Re: Enabling Auth between Zookeeper Servers

2020-02-13 Thread Szalay-Bekő Máté

Hi Sebastian,

> But I still get the same exception.
at this point I don't know why this happen... Adding the Server section to
the jaas config should have helped. Unfortunately the exact exception is
not printed out into the logs, just the error message, so it is hard to
find out more details.

I will try to reproduce your case with 3.5.6 locally and see if it works. I
never actually used digest authentication before... we always use kerberos
in production. If it works, I will share my configs / dockerfiles and send
you a patched version with more debug info printed out.

> Why would configuring quorum-auth also enable client-server-auth?
it is not very logical indeed... if I see it right, based on the code once
you set the java.security.auth.login.config property, then ZooKeeper
assumes you want to use server-client sasl authentication. I guess the
quorum-auth feature was added later and they introduced 'enable' config
property for this, but forget to introduce the same config for the client
authentication. I also guess most of the people are interested in the
client authentication and it is rare that someone does't need that but
needs quorum auth. Still, the current behaviour is not good I think. I will
submit a jira ticket requesting an improvement here when I will have time,
but feel free to submit it yourself if you wish.

Kind regards,
Mate

On Thu, Feb 13, 2020 at 7:41 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hey Mate,
>
> I checked the java.env-file and it contains:
>
>
> SERVER_JVMFLAGS="-Djava.security.auth.login.config=/opt/zookeeper-cluster/zookeeper/conf/jaas.conf"
>
> which is exactly the place where the pasted jaas.conf is placed.
>
> I also just changed the config to be saslLoginContext and added the
> missing semicolon.
>
> But I still get the same exception.
>
> Why would configuring quorum-auth also enable client-server-auth?
>
> Thanks
>
> Sebastian
>
>
> On 13-Feb-20 5:50 AM, Szalay-Bekő Máté wrote:
> > Hi Sebastian,
> >
> > thanks for the more details!
> >
> > One thing I found in your config is that you should use:
> > quorum.auth.learner.saslLoginContext=QuorumLearner
> > quorum.auth.server.saslLoginContext=QuorumServer
> >
> > so instead of  loginContext, use  saslLoginContext in both lines.  I
> found
> > this in the source code, I think the wiki is wrong (I will fix it later).
> > However, actually this didn't really change anything, as the default
> values
> > are anyway
> > QuorumLearner and  QuorumServer, so you can even skip these lines from
> the
> > config.
> >
> > I think Rakesh is right, you are seeing exceptions related to not the
> > QuorumSasl, but the ClientSasl. This is why ZooKeeper tries to find the
> > 'Server' section (what is configuring the server during the client-server
> > authentication). The name of this section can be overwritten by the
> > "zookeeper.sasl.serverconfig" system property.
> >
> > Based on the exception, ZooKeeper can not find the 'Server' section in
> > the /opt/zookeeper-cluster/zookeeper/conf/jaas.conf file. Are you sure
> this
> > is the correct jaas.conf? Does the ZooKeeper process have the permissions
> > to open this file? You can specify the jaas config file path for
> ZooKeeper
> > by providing custom system property e.g. by exporting
> > SERVER_JVMFLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf"
> > before starting zkServer.sh
> >
> > Also in the jaas.conf you copied here, you are missing a semicolon from
> the
> > end of the last line in the Server block. I am not sure if it is causing
> > any parsing error, but I always add the semicolon to the end of the last
> > line in the block.
> >
> > Mate
> >
> > On Tue, Feb 11, 2020 at 7:53 PM Sebastian Schmitz <
> > sebastian.schm...@propellerhead.co.nz> wrote:
> >
> >> Hello Rakesh,
> >>
> >> as mentioned in the other mail adding the "Server"to jaas.conf didn't
> help.
> >>
> >> Here are the Configs and Logs (with the Server-part included):
> >>
> >> jaas.conf:
> >> QuorumServer {
> >>  org.apache.zookeeper.server.auth.DigestLoginModule required
> >>  user_zookeeper="test";
> >> };
> >>
> >> QuorumClient {
> >>  org.apache.zookeeper.server.auth.DigestLoginModule required
> >>  username="zookeeper"
> >>  password="test";
> >> };
> >>
> >> Server {
> >>  org.apache.zookeeper.server.auth.DigestLoginModule required
&g

Re: Enabling Auth between Zookeeper Servers

2020-02-12 Thread Szalay-Bekő Máté

Hi Sebastian,

thanks for the more details!

One thing I found in your config is that you should use:
quorum.auth.learner.saslLoginContext=QuorumLearner
quorum.auth.server.saslLoginContext=QuorumServer

so instead of  loginContext, use  saslLoginContext in both lines.  I found
this in the source code, I think the wiki is wrong (I will fix it later).
However, actually this didn't really change anything, as the default values
are anyway
QuorumLearner and  QuorumServer, so you can even skip these lines from the
config.

I think Rakesh is right, you are seeing exceptions related to not the
QuorumSasl, but the ClientSasl. This is why ZooKeeper tries to find the
'Server' section (what is configuring the server during the client-server
authentication). The name of this section can be overwritten by the
"zookeeper.sasl.serverconfig" system property.

Based on the exception, ZooKeeper can not find the 'Server' section in
the /opt/zookeeper-cluster/zookeeper/conf/jaas.conf file. Are you sure this
is the correct jaas.conf? Does the ZooKeeper process have the permissions
to open this file? You can specify the jaas config file path for ZooKeeper
by providing custom system property e.g. by exporting
SERVER_JVMFLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf"
before starting zkServer.sh

Also in the jaas.conf you copied here, you are missing a semicolon from the
end of the last line in the Server block. I am not sure if it is causing
any parsing error, but I always add the semicolon to the end of the last
line in the block.

Mate

On Tue, Feb 11, 2020 at 7:53 PM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello Rakesh,
>
> as mentioned in the other mail adding the "Server"to jaas.conf didn't help.
>
> Here are the Configs and Logs (with the Server-part included):
>
> jaas.conf:
> QuorumServer {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_zookeeper="test";
> };
>
> QuorumClient {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="zookeeper"
> password="test";
> };
>
> Server {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_zookeeper="test"
> };
>
> Client {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="zookeeper"
> password="test";
> };
>
> zoo.cfg:
> # The number of milliseconds of each tick
> tickTime=2000
> # The number of ticks that the initial
> # synchronization phase can take
> initLimit=10
> # The number of ticks that can pass between
> # sending a request and getting an acknowledgement
> syncLimit=5
> # the directory where the snapshot is stored.
> # do not use /tmp for storage, /tmp here is just
> # example sakes.
> dataDir=/mnt/zk_data
> # the port at which the clients will connect
> clientPort=2181
> # the maximum number of client connections.
> # increase this if you need to handle more clients
> #maxClientCnxns=60
> #
> # Be sure to read the maintenance section of the
> # administrator guide before turning on autopurge.
> #
> #
> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
> #
> # The number of snapshots to retain in dataDir
> #autopurge.snapRetainCount=3
> # Purge task interval in hours
> # Set to "0" to disable auto purge feature
> #autopurge.purgeInterval=1
> dataLogDir=/mnt/zk_data_log
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=24
> quorum.auth.enableSasl=true
> quorum.auth.learnerRequireSasl=false
> quorum.auth.serverRequireSasl=false
> quorum.auth.learner.loginContext=QuorumLearner
> quorum.auth.server.loginContext=QuorumServer
> quorum.cnxn.threads.size=20
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> secureClientPort=2281
> server.1=0.0.0.0:2888:3888
> server.2=kafkad02.x.azure.com:2888:3888
> server.3=kafkad03.x.azure.com:2888:3888
>
> Server-Log:
> Using config: /opt/zookeeper-cluster/zookeeper/bin/../conf/zoo.cfg
> Feb 11, 2020 18:43:53 + [1 1] com.newrelic INFO: New Relic Agent:
> Loading configuration file "/opt/zookeeper-cluster/newrelic/./newrelic.yml"
> Feb 11, 2020 18:43:53 + [1 1] com.newrelic INFO: Using default
> collector host: collector.newrelic.com
> Feb 11, 2020 18:43:53 + [1 1] com.newrelic INFO: New Relic Agent:
> Writing to log file:
> /opt/zookeeper-cluster/newrelic/logs/newrelic_agent.log
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by
> com.newrelic.weave.weavepackage.NewClassAppender
> (file:/opt/zookeeper-cluster/newrelic/newrelic.jar) to method
> java.net.URLClassLoader.addURL(java.net.URL)
> WARNING: Please consider reporting this to the maintainers of
> com.newrelic.weave.weavepackage.NewClassAppender
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 2020-02-11 18:43:59,257 [myid:] - INFO

Re: Enabling Auth between Zookeeper Servers

2020-02-11 Thread Szalay-Bekő Máté

Hello Sebastian,

In general I think you did configure ZooKeeper just fine. A few remarks:
- I am not sure how ZooKeeper server-server authentication is expected to
work when you only use a single server. Would you mind trying to start a
Quorum with e.g. 3 servers?
- also, I think it is a good idea to avoid using 0.0.0.0 as hostname,
especially if you would run the cluster on multiple different servers /
docker containers. Try using the fully qualified domain name for a
multi-server setup, or if you just test multiple ZooKeeper servers on the
same machine, then just use 127.0.0.1. (maybe it has no effect in the
current case, but for SSL or for dynamic reconfig it might be bad to use
0.0.0.0. Also I remember problems with rolling restarts when using 0.0.0.0
in the config)
- is there a reason why you set 'quorum.auth.learnerRequireSasl' and
'quorum.auth.serverRequireSasl' to false? Using false is usually good idea
during rolling upgrade, but if you start a new cluster and want to use
server-server authentication, then you can just set them to true.

I don't understand why you got the exception " No JAAS configuration
section named 'Server' was found" ... Setting the loginContext should have
fixed that. If you still see the same issue with the 3 server setup, then
can you please share the config files, the command how you start ZooKeeper
and also the log files with us, so that we can look deeper?

Kind regards,
Mate

On Tue, Feb 11, 2020 at 2:56 AM Sebastian Schmitz <
sebastian.schm...@propellerhead.co.nz> wrote:

> Hello,
>
> I'm currently looking into enabling the Auth between Zookeeper-Servers
> and found this documentation:
>
>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication
>
> However, when I use the config from the document (for Digest-MD5) I get
> this exception in Zookeeper 3.4.14 and also 3.5.6, which I tried because
> I thought using latest version could help:
> java.io.IOException: No JAAS configuration section named 'Server' was
> found in '/opt/zookeeper-cluster/zookeeper/conf/jaas.conf
>
> And of course that's right, because there's only QuorumServer and
> QuorumClient in the jaas.conf:
>
> jaas.conf:
> QuorumServer {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> user_zookeeper="test";
> };
>
> QuorumClient {
> org.apache.zookeeper.server.auth.DigestLoginModule required
> username="zookeeper"
> password="test";
> };
>
> I also tried renaming the QuorumServer to just "Server". No change.
>
> My zoo.cfg:
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/mnt/zk_data
> clientPort=2181
> dataLogDir=/mnt/zk_data_log
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=24
> quorum.auth.enableSasl=true
> quorum.auth.learnerRequireSasl=false
> quorum.auth.serverRequireSasl=false
> quorum.auth.learner.loginContext=QuorumLearner
> quorum.auth.server.loginContext=QuorumServer
> quorum.cnxn.threads.size=20
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> secureClientPort=2281
> server.1=0.0.0.0:2888:3888
>
> Any idea what I could try? Or maybe there's some better document on how
> to achieve this?
>
> Thank you
>
> Sebastian
>
>
> --
> DISCLAIMER
> This email contains information that is confidential and which
> may be
> legally privileged. If you have received this email in error please
>
> notify the sender immediately and delete the email.
> This email is intended
> solely for the use of the intended recipient and you may not use or
> disclose this email in any way.
>

Re: Zookeeper server not responding.

2020-01-24 Thread Szalay-Bekő Máté

Hi Pramod,

Zookeeper 3.5.1-alpha is not a stable release (also it is quite an old
one). Did you choose it for a reason? If any 3.5 ZooKeeper would be OK for
you, then I suggest the latest stable one: ZooKeeper 3.5.6
See the release notes:
https://zookeeper.apache.org/releases.html#releasenotes

Kind regards,
Mate

On Fri, Jan 24, 2020 at 2:27 AM Pramod Srinivasan
 wrote:

> Hello Everyone,
>
> I am using Zookeeper 3.5.1-alpha and I see a problem when I am using a 2
> node setup.
>
> Node 1 Zookeeper logs:
>
> 2020-01-11 11:29:52,141 [myid:2147483653] - INFO
> [QuorumPeerListener:QuorumCnxManager$Listener@631] - My election bind
> port: 0.0.0.0/0.0.0.0:61898
> 2020-01-11  11:29:52,149
> [myid:2147483653] - ERROR
> [WorkerSender[myid=2147483653]:NIOServerCnxnFactory$1@92] - Thread
> Thread[WorkerSender[myid=2147483653],5,main] died
> java.lang.NullPointerException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
> Source)
> at java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:462)
> at java.lang.Thread.run(Unknown Source)
> 2020-01-11 11:29:52,161 [myid:2147483653] - INFO
> [QuorumPeer[myid=2147483653](plain=/0:0:0:0:0:0:0:0:61896)(secure=disabled):QuorumPeer@986]
> - LOOKING
>
> Node 2 Zookeeper logs:
>
> 2020-01-11 11:29:51,852 [myid:2147483652] - WARN
> [WorkerSender[myid=2147483652]:QuorumCnxManager@459] - Cannot open
> channel to 2147483653 at election address /128.0.0.5:61898
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
> at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown
> Source)
> at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
> at java.net.SocksSocketImpl.connect(Unknown Source)
> at java.net.Socket.connect(Unknown Source)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:444)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:485)
> at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
> at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
> at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
> at java.lang.Thread.run(Unknown Source)
>
> Zookeeper server on the nodes never recover from this state and clients
> are unable to connect to the server. Any hint on what the problem is based
> on the back trace on Node 1 logs? Is this a Zookeeper server code issue or
> a setup issue?
>
> Thanks,
> Pramod
>
>
>

Re: compilation fails on centos 7

2020-01-24 Thread Szalay-Bekő Máté

Hi Kannan,

We are building / testing / running ZooKeeper in production using CentOS
7.5 among other linux versions, so 7.5 should definitely work.
Although I am not sure about CentOS 7.

Do you actually need to run the tests?
If not, then you can use the following command to skip the test executions:
mvn clean install -DskipTests

or if you also need the native libraries:
mvn clean install -DskipTests -Pfull-build

Also, if you think the tests are flaky in your case, you can ask maven to
try them more times:
mvn -Dsurefire.rerunFailingTestsCount=3 test -fae


I use to build ZooKeeper many times in docker on my macbook pro, using an
official CentOS 7.5 image. These are my steps:
```
docker run --volume ~/git:/git -it --rm  centos:7.5.1804 /bin/bash

# then inside docker I install all the dependencies I need (also everything
for the native C library builds and the zkpython build)
yum -y install autoconf automake mc git maven java cppunit
python-setuptools screen libtool python-devel cppunit-devel gcc-c++ telnet
openssl openssl-devel

# if you also need SASL for the C client (only available on the master
branch)
yum -y install cyrus-sasl-md5 cyrus-sasl-gssapi cyrus-sasl-devel

cd /git/zookeeper

# this gives you a clean state, if you use git (not needed if you just
downloaded the source tarball)
git clean -xdf

# full build incl. native C code, skipping the tests
mvn clean install -Pfull-build -DskipTests
```

On Fri, Jan 24, 2020 at 8:52 AM KANNAN VARADHAN  wrote:

> Hi Marnix:
>
> My initial selinux state was “Permissive”.
> I rebooted with selinux disabled, and did the build.  Same end result.
>
> org.apache.zookeeper.test.AsyncHammerTest has some variability (more or
> less failures in any given run) but
> org.apache.zookeeper.ClientRequestTimeoutTest and
> org.apache.zookeeper.test.LocalSessionRequestTest
> are consistent in the type of failure.
>
> Kannan
>
> > On Jan 23, 2020, at 9:12 PM, Marnix Janse  wrote:
> >
> > Hi Kannan,
> >
> > Could it be that SElinux is active and the compiled files and test
> scrips have an incorrect security context? To me it seems that the unit
> test can’t connect to the processes or the processes are unable to accept
> connections due to a missing or incorrect security policy.
> >
> > Cheers,
> > Marnix
> >
> >> On 24 Jan 2020, at 00:52, KANNAN VARADHAN  wrote:
> >>
> >> 
> >> Hi:
> >>
> >> I am trying to fix (or understand, hence how to fix) this compile error
> with zookeeper, v3.5.6, and am seeing failures.
> >> Specifically, mvn test fails in some very specific tests.  I am not
> sure that it has anything to do directly with centos, per se.
> >>
> >> The environment:
> >>
> >> [➜  apache-zookeeper-3.5.6 ]  uname -snr
> >> Linux a5s1 5.4.13-1.el7.elrepo.x86_64
> >> [➜  apache-zookeeper-3.5.6 ]
> >> [➜  apache-zookeeper-3.5.6 ]  java -version
> >> openjdk version "1.8.0_232"
> >> OpenJDK Runtime Environment (build 1.8.0_232-b09)
> >> OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
> >> [➜  apache-zookeeper-3.5.6 ]
> >> [➜  apache-zookeeper-3.5.6 ]  cat /etc/os-release
> >> NAME="CentOS Linux"
> >> VERSION="7 (Core)"
> >> ID="centos"
> >> ID_LIKE="rhel fedora"
> >> VERSION_ID="7"
> >> PRETTY_NAME="CentOS Linux 7 (Core)"
> >> ANSI_COLOR="0;31"
> >> CPE_NAME="cpe:/o:centos:centos:7"
> >> HOME_URL="https://www.centos.org/;
> >> BUG_REPORT_URL="https://bugs.centos.org/;
> >>
> >> CENTOS_MANTISBT_PROJECT="CentOS-7"
> >> CENTOS_MANTISBT_PROJECT_VERSION="7"
> >> REDHAT_SUPPORT_PRODUCT="centos"
> >> REDHAT_SUPPORT_PRODUCT_VERSION="7"
> >>
> >> [➜  apache-zookeeper-3.5.6 ]
> >>
> >>
> >> The errors are in three tests and the pattern appears to be:
> >> [ERROR] Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time
> elapsed: 20.583 s <<< FAILURE! - in
> org.apache.zookeeper.server.ZooKeeperServerMainTest
> >> java.lang.AssertionError: waiting for server being up at
> org.apache.zookeeper.server.ZooKeeperServerMainTest.testReadOnlySnapshotDir(ZooKeeperServerMainTest.java:234)
> >> java.lang.AssertionError: waiting for server being up at
> org.apache.zookeeper.server.ZooKeeperServerMainTest.testReadOnlyTxnLogDir(ZooKeeperServerMainTest.java:273)
> >>
> >> [ERROR] Tests run: 107, Failures: 0, Errors: 6, Skipped: 0, Time
> elapsed: 295.05 s <<< FAILURE! - in
> org.apache.zookeeper.test.NettyNettySuiteTest
> >> java.io.IOException: Couldn't instantiate
> org.apache.zookeeper.server.NettyServerCnxnFactory at
> org.apache.zookeeper.test.AsyncOpsTest.setUp(AsyncOpsTest.java:48)
> >> java.lang.NullPointerException at
> org.apache.zookeeper.test.AsyncOpsTest.tearDown(AsyncOpsTest.java:59)
> >> java.io.IOException: Couldn't instantiate
> org.apache.zookeeper.server.NettyServerCnxnFactory at
> org.apache.zookeeper.test.AsyncOpsTest.setUp(AsyncOpsTest.java:48)
> >> java.lang.NullPointerException at
> org.apache.zookeeper.test.AsyncOpsTest.tearDown(AsyncOpsTest.java:59)
> >> java.io.IOException: Couldn't instantiate
> org.apache.zookeeper.server.NettyServerCnxnFactory at
>

Re: [ANNOUNCE] Enrico Olivelli new ZooKeeper PMC Member

2020-01-21 Thread Szalay-Bekő Máté

Congratulations! :)

On Wed, Jan 22, 2020, 06:36 David Mollitor  wrote:

> You've been a great help to me.  Well deserved!
>
> On Wed, Jan 22, 2020, 12:09 AM Mohammad arshad  >
> wrote:
>
> > Congratulations Enrico!
> >
> >
> > -Original Message-
> > From: rammohan ganapavarapu [mailto:rammohanga...@gmail.com]
> > Sent: Wednesday, January 22, 2020 3:45 AM
> > To: user@zookeeper.apache.org
> > Cc: DevZooKeeper 
> > Subject: Re: [ANNOUNCE] Enrico Olivelli new ZooKeeper PMC Member
> >
> > Congratulations Enrico!!
> >
> > On Tue, Jan 21, 2020 at 1:41 PM Flavio Junqueira  wrote:
> >
> > > I'm pleased to announce that Enrico Olivelli recently became the
> > > newest ZooKeeper PMC member. Enrico has contributed immensely to this
> > > community; he became a ZooKeeper committer in May 2019 and now he joins
> > the PMC.
> > >
> > > Join me in congratulating him on the achievement. Congrats, Enrico!
> > >
> > > -Flavio on behalf of the Apache ZooKeeper PMC
> >
>

Re: ZooKeeper in secure mode

2020-01-17 Thread Szalay-Bekő Máté

For testing it on the real Hadoop cluster with real Kerberos, we used
ZooKeeper 3.5.5.
AFAIK ZooKeeper 3.5.6 should behave just the same in terms of SASL and SSL.

(I also created unit tests for SASL + SSL in the PRs of the Jira ticket I
mentioned, those can give you configuration examples for the branches 3.5
and master)

Regards,
Mate

On Fri, Jan 17, 2020 at 4:40 AM Praveen Kumar K S 
wrote:

> Thanks Mate. May I know the version of zookeeper you are using?
>
> Regards,
> Praveen Kumar K S
> +91-9986855625
>
>
> On Thu, Jan 16, 2020 at 8:45 PM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com>
> wrote:
>
> > Hi Praveen,
> >
> > Regarding SASL, some useful links:
> > -
> >
> >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication
> > (I
> > just updated this page today)
> > - I was also checking the Kerberos JAAS configs when I tried these things
> > locally:
> >
> >
> https://docs.oracle.com/javase/8/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/Krb5LoginModule.html
> > - this is a good howto as well:
> https://github.com/ekoontz/zookeeper/wiki
> > -
> >
> >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication
> >
> > In this Jira case you can see some zoo.cfg and client configs that we
> used
> > to test SASL + SSL:
> >
> >
> https://issues.apache.org/jira/browse/ZOOKEEPER-3482?focusedCommentId=16998033=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16998033
> >
> > With these configs we were managed to use ZooKeeper SASL + SSL on a real
> > Hadoop cluster using MIT Kerberos.
> >
> > Mate
> >
> > On Thu, Jan 16, 2020 at 10:39 AM Praveen Kumar K S <
> > prav...@securelyshare.com> wrote:
> >
> > > Thanks Enrico. I was also looking at
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2220 who is facing
> same
> > > issue.
> > >
> > > I will try with your suggestion. My requirement is to enable SASL based
> > > authentication between server-server and client-server.
> > >
> > > Please advise if I'm looking at the right place or is there any better
> > > documentation.
> > >
> > > Regards,
> > > Praveen Kumar K S
> > > +91-9986855625
> > >
> > >
> > > On Thu, Jan 16, 2020 at 3:01 PM Enrico Olivelli - Diennea <
> > > enrico.olive...@diennea.com> wrote:
> > >
> > > > Praveen
> > > > In order to use Netty it is better for you to use 3.5.6 that contains
> > > > Netty 4, ZooKeeper 3.4.x uses the deprecated Netty 3. For TSL, and it
> > is
> > > > known to have security flaws and it is no more maintained
> > > >
> > > > Btw your problem looks like there is a missing class and it is weird
> > > >
> > > > Enrico
> > > >
> > > > Il giorno 16/01/20, 10:25 "Praveen Kumar K S" <
> > > prav...@securelyshare.com>
> > > > ha scritto:
> > > >
> > > > Hello,
> > > >
> > > > I'm looking for help on enabling authentication in zookeeper.
> > Please
> > > > note
> > > > below approach I have tried.
> > > >
> > > > 1. I followed
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
> > > > 2. I'm deploying zookeeper as single node using docker
> > > > 3. Zookeeper version is 3.4.13
> > > > 4. Below are some important environmental variables in zookeeper
> > > > container
> > > >
> > > >
> > > >
> > >
> >
> CLIENT_JVMFLAGS=-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> > > > -Dzookeeper.client.secure=true
> > > >
> > >
> -Dzookeeper.ssl.keyStore.location=/opt/vault/zookeeper/ssl/KeyStore.jks
> > > > -Dzookeeper.ssl.keyStore.password=XX@123
> > > >
> > > >
> > >
> >
> -Dzookeeper.ssl.trustStore.location=/opt/vault/zookeeper/ssl/truststore.jks
> > > > -Dzookeeper.ssl.trustStore.password=XX@123
> > > >
> > > >
> > > >
> > >
> >
> SERVER_JVMFLAGS=-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> > > >
> > >
> -Dzookeeper.ssl.keyStore.location=/opt/vault/zookeeper/ssl/KeyStore.jk

Re: ZooKeeper in secure mode

2020-01-16 Thread Szalay-Bekő Máté

Hi Praveen,

Regarding SASL, some useful links:
-
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication
(I
just updated this page today)
- I was also checking the Kerberos JAAS configs when I tried these things
locally:
https://docs.oracle.com/javase/8/docs/jre/api/security/jaas/spec/com/sun/security/auth/module/Krb5LoginModule.html
- this is a good howto as well: https://github.com/ekoontz/zookeeper/wiki
-
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication

In this Jira case you can see some zoo.cfg and client configs that we used
to test SASL + SSL:
https://issues.apache.org/jira/browse/ZOOKEEPER-3482?focusedCommentId=16998033=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16998033

With these configs we were managed to use ZooKeeper SASL + SSL on a real
Hadoop cluster using MIT Kerberos.

Mate

On Thu, Jan 16, 2020 at 10:39 AM Praveen Kumar K S <
prav...@securelyshare.com> wrote:

> Thanks Enrico. I was also looking at
> https://issues.apache.org/jira/browse/ZOOKEEPER-2220 who is facing same
> issue.
>
> I will try with your suggestion. My requirement is to enable SASL based
> authentication between server-server and client-server.
>
> Please advise if I'm looking at the right place or is there any better
> documentation.
>
> Regards,
> Praveen Kumar K S
> +91-9986855625
>
>
> On Thu, Jan 16, 2020 at 3:01 PM Enrico Olivelli - Diennea <
> enrico.olive...@diennea.com> wrote:
>
> > Praveen
> > In order to use Netty it is better for you to use 3.5.6 that contains
> > Netty 4, ZooKeeper 3.4.x uses the deprecated Netty 3. For TSL, and it is
> > known to have security flaws and it is no more maintained
> >
> > Btw your problem looks like there is a missing class and it is weird
> >
> > Enrico
> >
> > Il giorno 16/01/20, 10:25 "Praveen Kumar K S" <
> prav...@securelyshare.com>
> > ha scritto:
> >
> > Hello,
> >
> > I'm looking for help on enabling authentication in zookeeper. Please
> > note
> > below approach I have tried.
> >
> > 1. I followed
> >
> >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
> > 2. I'm deploying zookeeper as single node using docker
> > 3. Zookeeper version is 3.4.13
> > 4. Below are some important environmental variables in zookeeper
> > container
> >
> >
> >
> CLIENT_JVMFLAGS=-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> > -Dzookeeper.client.secure=true
> >
>  -Dzookeeper.ssl.keyStore.location=/opt/vault/zookeeper/ssl/KeyStore.jks
> > -Dzookeeper.ssl.keyStore.password=XX@123
> >
> >
> -Dzookeeper.ssl.trustStore.location=/opt/vault/zookeeper/ssl/truststore.jks
> > -Dzookeeper.ssl.trustStore.password=XX@123
> >
> >
> >
> SERVER_JVMFLAGS=-Dzookeeper.serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> >
>  -Dzookeeper.ssl.keyStore.location=/opt/vault/zookeeper/ssl/KeyStore.jks
> > -Dzookeeper.ssl.keyStore.password=XX@123
> >
> >
> -Dzookeeper.ssl.trustStore.location=/opt/vault/zookeeper/ssl/truststore.jks
> > -Dzookeeper.ssl.trustStore.password=XX@123
> >
> >
> >
> zookeeper.serverCnxnFactory="org.apache.zookeeper.server.NettyServerCnxnFactory"
> >
> > 5. Below is conf file
> > server.1=0.0.0.0:2888:3888
> > secureClientPort=2281
> > initLimit=5
> > syncLimit=2
> > tickTime=2000
> > clientPort=2181
> > clientPortAddress=zookeeper
> > dataLogDir=/opt/vault/zookeeper/logs
> > dataDir=/opt/vault/zookeeper/data
> >
> > 6. Zookeeper is healthy
> > 7. I tried connecting to Zookeeper server from my machine using
> > zkCli.sh.
> > But getting below error
> >
> > 2020-01-16 14:21:27,798 [myid:] - INFO  [main:ZooKeeper@442] -
> > Initiating
> > client connection, connectString=zookeeper:2281 sessionTimeout=3
> > watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@531d72ca
> > Exception in thread "main" java.io.IOException: Couldn't instantiate
> > org.apache.zookeeper.ClientCnxnSocketNetty
> > at
> > org.apache.zookeeper.ZooKeeper.getClientCnxnSocket(ZooKeeper.java:1851)
> > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:453)
> > at
> > org.apache.zookeeper.ZooKeeperMain.connectToZK(ZooKeeperMain.java:283)
> > at org.apache.zookeeper.ZooKeeperMain.(ZooKeeperMain.java:297)
> > at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:290)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.zookeeper.ClientCnxnSocketNetty
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:264)
> > at
> >

Re: Zookeeper and curator SASL authentication

2020-01-16 Thread Szalay-Bekő Máté

great! :)

On Wed, Jan 15, 2020 at 6:38 PM Arpit Jain  wrote:

> I managed to create ACL with authenticated client principal using below
> lines of code in client:
>
> curator
> .create().creatingParentContainersIfNeeded().withACL(ZooDefs.Ids.
> CREATOR_ALL_ACL).forPath("/mynode");
>
>
> ZooDefs.Ids.CREATOR_ALL_ACL gives permissions to the client which is
> authenticated.
>
> To test this, I logged in using zkCli.sh on ZK server and ran getAcl
> /mynode and able to browse the znodes and can see that node has all (CDRWA)
> permission for authenticated uses. If I log in with a unauthenticated
> principal, I am not able to see the znodes tree even though I manage to
> connect to ZK server.
>
> On Wed, Jan 15, 2020 at 12:19 PM Enrico Olivelli - Diennea <
> enrico.olive...@diennea.com> wrote:
>
>> Yes, they are system properties
>>
>> You can take this guide (about Kafka) as example
>>
>> https://docs.confluent.io/current/kafka/authentication_sasl/authentication_sasl_gssapi.html
>>
>>
>>
>> Il giorno 15/01/20, 13:17 "Arpit Jain"  ha
>> scritto:
>>
>> I have not passed those parameters. Is this something I need to set in
>> Zookeeper (zoo.cfg) ?
>>
>> On Wed, Jan 15, 2020 at 12:12 PM Enrico Olivelli - Diennea <
>> enrico.olive...@diennea.com> wrote:
>>
>> > Usually with SASL auth you are using:
>> > kerberos.removeHostFromPrincipal=true
>> > kerberos.removeRealmFromPrincipal=true
>> >
>> > is this the case for you ?
>> >
>> > Enrico
>> >
>> > Il giorno 15/01/20, 13:01 "Arpit Jain"  ha
>> > scritto:
>> >
>> > I have asked in Curator mailing list as well but not much help.
>> I am
>> > able
>> > to set ACL with sasl scheme by using zkCli.sh client in
>> Zookeeper
>> > server.
>> > The idea is to use Curator to set the ACLs so that only my
>> client
>> > application can access its Znodes.
>> >
>> >
>> > On Wed, Jan 15, 2020 at 9:21 AM Szalay-Bekő Máté <
>> > szalay.beko.m...@gmail.com>
>> > wrote:
>> >
>> > > I am not sure what is wrong with the code... I am not
>> familiar with
>> > > Curator. I can try to google / reproduce this and see what is
>> wrong,
>> > but it
>> > > will take a while for me. So first I would ask the others,
>> maybe
>> > there is
>> > > someone who knows both ZooKeeper SASL and Curator and can
>> help you
>> > more in
>> > > this mailing list. If noone replies, then I will try to setup
>> a dummy
>> > > project with Curator to test this.
>> > >
>> > > Did you also ask around the Curator mailing list maybe? Would
>> it
>> > help if I
>> > > send you code about setting the ACLs using plain ZooKeeper
>> (and no
>> > Curator)?
>> > >
>> > > On Tue, Jan 14, 2020 at 2:48 PM Arpit Jain <
>> jain.arp...@gmail.com>
>> > wrote:
>> > >
>> > >> Thanks for the clarification.
>> > >> I am able to authenticate client with Zookeeper. However,
>> when I
>> > started
>> > >> to set ACLs with the same client, I get error messages. This
>> is how
>> > I am
>> > >> creating curator client for setting ACLs
>> > >>
>> > >> CuratorFrameworkFactory.Builder builder =
>> > >>
>> > >> CuratorFrameworkFactory.builder().connectString(
>> > >> coordinatorHosts).retryPolicy(retryPolicy)
>> > >>
>> > >> .connectionTimeoutMs(coordinatorConnectionTimeout
>> > >> ).sessionTimeoutMs(coordinatorSessionTimeout);
>> > >>
>> > >> final CuratorFramework curatorFramework =
>> > >>
>> > >> builder.authorization("sasl", "zkclient/
>> > z...@example.com"
>> > >> .getBytes()).aclProvider(new ACLProvider() {
>> > >>
>> > >> @Override
>>

Re: Zookeeper and curator SASL authentication

2020-01-15 Thread Szalay-Bekő Máté

I am not sure what is wrong with the code... I am not familiar with
Curator. I can try to google / reproduce this and see what is wrong, but it
will take a while for me. So first I would ask the others, maybe there is
someone who knows both ZooKeeper SASL and Curator and can help you more in
this mailing list. If noone replies, then I will try to setup a dummy
project with Curator to test this.

Did you also ask around the Curator mailing list maybe? Would it help if I
send you code about setting the ACLs using plain ZooKeeper (and no Curator)?

On Tue, Jan 14, 2020 at 2:48 PM Arpit Jain  wrote:

> Thanks for the clarification.
> I am able to authenticate client with Zookeeper. However, when I started
> to set ACLs with the same client, I get error messages. This is how I am
> creating curator client for setting ACLs
>
> CuratorFrameworkFactory.Builder builder =
>
> CuratorFrameworkFactory.builder().connectString(
> coordinatorHosts).retryPolicy(retryPolicy)
>
> .connectionTimeoutMs(coordinatorConnectionTimeout
> ).sessionTimeoutMs(coordinatorSessionTimeout);
>
> final CuratorFramework curatorFramework =
>
> builder.authorization("sasl", "zkclient/z...@example.com"
> .getBytes()).aclProvider(new ACLProvider() {
>
> @Override
>
> public List getDefaultAcl() {
>
> return ZooDefs.Ids.CREATOR_ALL_ACL;
>
> }
>
>
> @Override
>
> public List getAclForPath(String path) {
>
> return ZooDefs.Ids.CREATOR_ALL_ACL;
>
> }
>
> }).build();
>
>
>  I see below logs in Zookeeper node:
>
>
>
>
>
> *2020-01-14 13:27:53,174 [myid:1] - INFO
>  [NIOWorkerThread-3:SaslServerCallbackHandler@120] - Successfully
> authenticated client: authenticationID=zkclient/z...@example.com
> ;  authorizationID=zkclient/z...@example.com
> .2020-01-14 13:27:53,175 [myid:1] - INFO
>  [NIOWorkerThread-3:SaslServerCallbackHandler@136] - Setting authorizedID:
> zkclient/z...@example.com 2020-01-14 13:27:53,175
> [myid:1] - INFO  [NIOWorkerThread-3:ZooKeeperServer@1170] - adding SASL
> authorization for authorizationID: zkclient/z...@example.com
> 2020-01-14 13:27:53,182 [myid:1] - INFO
>  [NIOWorkerThread-7:ZooKeeperServer@1095] - got auth packet
> /172.30.0.6:36658 <http://172.30.0.6:36658>2020-01-14 13:27:53,183 [myid:1]
> - WARN  [NIOWorkerThread-7:ZooKeeperServer@1123] - Authentication failed
> for scheme: sasl*
>
> Is this not the correct way to do it ?
>
>
>
> On Tue, Jan 14, 2020 at 11:52 AM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
>> The system property name is a bit misleading... this parameter is
>> actually specifies the username used in the ZooKeeper server principal.
>> (in your case the server principal is: zookeeper/z...@example.com)
>> AFAIK the ZooKeeper client (after authenticated as zkclient/
>> z...@example.com in Kerberos based on the jaas.conf file) needs to know
>> the ZooKeeper server principal in order to ask for a specific token from
>> kerberos which can be read by the ZooKeeper server.
>>
>> In 3.5.5 (or 3.5.6) you can use the  zookeeper.sasl.client.username
>> parameter (plus some other parameters) to configure how the server
>> principal will be determined by the client.
>> See:
>> https://github.com/apache/zookeeper/blob/c11b7e26bc554b8523dc929761dd28808913f091/zookeeper-server/src/main/java/org/apache/zookeeper/SaslServerPrincipal.java#L48
>>
>> In future releases (3.5.7, 3.6, ...) you can also use
>> the zookeeper.server.principal parameter (a much better name I think) to
>> use a fix server principal name in the client.
>> See:
>> https://github.com/apache/zookeeper/blob/1c5d135d74f16275876c024401dc2de92909b20a/zookeeper-server/src/main/java/org/apache/zookeeper/SaslServerPrincipal.java#L50
>>
>> On Mon, Jan 13, 2020 at 6:03 PM Arpit Jain  wrote:
>>
>>> Does this user name have to be "Zookeeper"
>>> (-Dzookeeper.sasl.client.username=zookeeper) always ?
>>> And the client principal name is different than this username..Correct
>>> me if I am wrong ?
>>>
>>> On Mon, Jan 13, 2020 at 4:58 PM Arpit Jain 
>>> wrote:
>>>
>>>> Thanks you so much !
>>>> It worked finally. I had to change
>>>> -Dzookeeper.sasl.client.username=zookeeper parameter.
>>>>
>>>> On Mon, Jan 13, 2020 at 4:40 PM Szalay-Bekő Máté <
>>>> szalay.beko.m...@gmail.com> wrote:
>>>>
>>>>> You are using 3.5.5 or 3.5.6, right?
>>>>&

Re: Zookeeper and curator SASL authentication

2020-01-14 Thread Szalay-Bekő Máté

The system property name is a bit misleading... this parameter is actually
specifies the username used in the ZooKeeper server principal.  (in your
case the server principal is: zookeeper/z...@example.com)
AFAIK the ZooKeeper client (after authenticated as zkclient/z...@example.com
in Kerberos based on the jaas.conf file) needs to know the ZooKeeper server
principal in order to ask for a specific token from kerberos which can be
read by the ZooKeeper server.

In 3.5.5 (or 3.5.6) you can use the  zookeeper.sasl.client.username
parameter (plus some other parameters) to configure how the server
principal will be determined by the client.
See:
https://github.com/apache/zookeeper/blob/c11b7e26bc554b8523dc929761dd28808913f091/zookeeper-server/src/main/java/org/apache/zookeeper/SaslServerPrincipal.java#L48

In future releases (3.5.7, 3.6, ...) you can also use
the zookeeper.server.principal parameter (a much better name I think) to
use a fix server principal name in the client.
See:
https://github.com/apache/zookeeper/blob/1c5d135d74f16275876c024401dc2de92909b20a/zookeeper-server/src/main/java/org/apache/zookeeper/SaslServerPrincipal.java#L50

On Mon, Jan 13, 2020 at 6:03 PM Arpit Jain  wrote:

> Does this user name have to be "Zookeeper"
> (-Dzookeeper.sasl.client.username=zookeeper) always ?
> And the client principal name is different than this username..Correct me
> if I am wrong ?
>
> On Mon, Jan 13, 2020 at 4:58 PM Arpit Jain  wrote:
>
>> Thanks you so much !
>> It worked finally. I had to change
>> -Dzookeeper.sasl.client.username=zookeeper parameter.
>>
>> On Mon, Jan 13, 2020 at 4:40 PM Szalay-Bekő Máté <
>> szalay.beko.m...@gmail.com> wrote:
>>
>>> You are using 3.5.5 or 3.5.6, right?
>>> I think you need to specify: -Dzookeeper.sasl.client.username=zookeeper
>>> can you give it a try? If it doesn't work then I can take a deeper look
>>> (also we can enable some debug logging)
>>>
>>> On Mon, Jan 13, 2020 at 5:31 PM Arpit Jain 
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have Kerberos, Zookeeper and my application (using curator) running
>>>> in 3 docker containers with ZK SASL authentication enabled. The ZK can
>>>> login to Kerberos and starts successfully.
>>>>
>>>> The ZK server principal is zookeeper/z...@example.com
>>>> The client principal is : zkclient/z...@example.com
>>>>
>>>> While starting my application, I am seeing failure while obtaining TGS.
>>>> See the log at Kerberos side:
>>>>
>>>>
>>>>
>>>> *Jan 13 15:22:19 kdc krb5kdc[20](info): AS_REQ (2 etypes {18 17})
>>>> 172.30.0.6 <http://172.30.0.6>: NEEDED_PREAUTH: zkclient/z...@example.com
>>>>  for krbtgt/example@example.com
>>>> , Additional pre-authentication requiredJan 13
>>>> 15:22:19 kdc krb5kdc[20](info): AS_REQ (2 etypes {18 17}) 172.30.0.6
>>>> <http://172.30.0.6>: ISSUE: authtime 1578928939, etypes {rep=18 tkt=18
>>>> ses=18}, zkclient/z...@example.com  for
>>>> krbtgt/example@example.com Jan 13 15:22:19 kdc
>>>> krb5kdc[20](info): TGS_REQ (4 etypes {18 17 16 23}) 172.30.0.6
>>>> <http://172.30.0.6>: ISSUE: authtime 1578928939, etypes {rep=18 tkt=18
>>>> ses=18}, zkclient/z...@example.com  for
>>>> zkclient/z...@example.com *
>>>>
>>>> However, if I use the zkCli.sh to login to Zookeeper, it successfully
>>>> logs in. See the log on Kerberos side. See the difference in the last line
>>>> while requesting TGS.
>>>>
>>>>
>>>>
>>>> *Jan 13 15:26:14 kdc krb5kdc[20](info): AS_REQ (2 etypes {18 17})
>>>> 172.30.0.3 <http://172.30.0.3>: NEEDED_PREAUTH: zkclient/z...@example.com
>>>>  for krbtgt/example@example.com
>>>> , Additional pre-authentication requiredJan 13
>>>> 15:26:14 kdc krb5kdc[20](info): AS_REQ (2 etypes {18 17}) 172.30.0.3
>>>> <http://172.30.0.3>: ISSUE: authtime 1578929174, etypes {rep=18 tkt=18
>>>> ses=18}, zkclient/z...@example.com  for
>>>> krbtgt/example@example.com Jan 13 15:26:14 kdc
>>>> krb5kdc[20](info): TGS_REQ (4 etypes {18 17 16 23}) 172.30.0.3
>>>> <http://172.30.0.3>: ISSUE: authtime 1578929174, etypes {rep=18 tkt=18
>>>> ses=18}, zkclient/z...@example.com  for
>>>> zookeeper/z...@example.com *
>>>>
>>>> The client section in JAAS config file is same in both the cases but
>>>> the server it is looking for is different in both the cases.
>>&g

1 2 >

1 - 100 of 109 matches

Mail list logo