CVE-2024-23944: Apache ZooKeeper: Information disclosure in persistent watcher handling

2024-03-14 Thread Andor Molnar
Severity: critical

Affected versions:

- Apache ZooKeeper 3.9.0 through 3.9.1
- Apache ZooKeeper 3.8.0 through 3.8.3
- Apache ZooKeeper 3.6.0 through 3.7.2

Description:

Information disclosure in persistent watchers handling in Apache ZooKeeper due 
to missing ACL check. It allows an attacker to monitor child znodes by 
attaching a persistent watcher (addWatch command) to a parent which the 
attacker has already access to. ZooKeeper server doesn't do ACL check when the 
persistent watcher is triggered and as a consequence, the full path of znodes 
that a watch event gets triggered upon is exposed to the owner of the watcher. 
It's important to note that only the path is exposed by this vulnerability, not 
the data of znode, but since znode path can contain sensitive information like 
user name or login ID, this issue is potentially critical.

Users are recommended to upgrade to version 3.9.2, 3.8.4 which fixes the issue.

Credit:

周吉安(寒泉)  (reporter)

References:

https://zookeeper.apache.org/
https://www.cve.org/CVERecord?id=CVE-2024-23944



[ANNOUNCE] Apache ZooKeeper 3.7 End-of-Life 2nd Feb, 2024

2024-02-01 Thread Andor Molnar
The Apache ZooKeeper community would like to make the official
announcement of 3.7 release line End-of-Life. It will be effective on
2nd of February, 2024 00:01 AM (PDT). From that day forward the 3.7
version of Apache ZooKeeper won’t be supported by the community which
means we won't

- accept patches on the 3.7.x branch, 
- run automated tests on any JDK version, 
- create new releases from 3.7.x branch, 
- resolve security issues, CVEs or critical bugs.

Latest released version of Apache ZooKeeper 3.7 (currently 3.7.2) will
be available on the download page for another year (until 2nd of
February, 2025), after that it will be accessible among other
historical versions from Apache Archives.

=== Upgrade ===

We recommend users of Apache ZooKeeper 3.7 to plan your production
upgrades according to the following supported upgrade path:

1) Upgrade to latest 3.8.x version
2) (Optional) Upgrade to latest 3.9.x version.

Please find known upgrade issues and workarounds on the following wiki
page: Upgrade FAQ [1]

In addition to that the user@ mailing list is open 24/7 to help and
answer your questions as usual.

=== Compatibility ===

Our backward compatibility rules still apply and can be found here:
Backward compatibility rules [2]

Following the recommended upgrade path with rolling upgrade process
ZooKeeper quorum will be available at all times as long as clients are
not starting to use new features.

Regards,
Andor

[1] https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ
[2] 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement





CVE-2023-44981: Apache ZooKeeper: Authorization bypass in SASL Quorum Peer Authentication

2023-10-11 Thread Andor Molnar
Severity: critical

Affected versions:

- Apache ZooKeeper 3.9.0
- Apache ZooKeeper 3.8.0 through 3.8.2
- Apache ZooKeeper 3.7.0 through 3.7.1
- Apache ZooKeeper before 3.7.0

Description:

Authorization Bypass Through User-Controlled Key vulnerability in Apache 
ZooKeeper. If SASL Quorum Peer authentication is enabled in ZooKeeper 
(quorum.auth.enableSasl=true), the authorization is done by verifying that the 
instance part in SASL authentication ID is listed in zoo.cfg server list. The 
instance part in SASL auth ID is optional and if it's missing, like 
'e...@example.com', the authorization check will be skipped. As a result an 
arbitrary endpoint could join the cluster and begin propagating counterfeit 
changes to the leader, essentially giving it complete read-write access to the 
data tree. Quorum Peer authentication is not enabled by default.

Users are recommended to upgrade to version 3.9.1, 3.8.3, 3.7.2, which fixes 
the issue.

Alternately ensure the ensemble election/quorum communication is protected by a 
firewall as this will mitigate the issue.

See the documentation for more details on correct cluster administration.

Credit:

Damien Diederen  (reporter)

References:

https://zookeeper.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-44981



[ANNOUNCE] Apache ZooKeeper 3.7.2

2023-10-09 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.7.2

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.7.2 Release Notes are at:
https://zookeeper.apache.org/doc/r3.7.2/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team





[ANNOUNCE] Apache ZooKeeper 3.8.3

2023-10-09 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.8.3

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.8.3 Release Notes are at:
https://zookeeper.apache.org/doc/r3.8.3/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team




[ANNOUNCE] Apache ZooKeeper 3.9.1

2023-10-09 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.9.1

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.9.1 Release Notes are at:
https://zookeeper.apache.org/doc/r3.9.1/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team





[ANNOUNCE] Apache ZooKeeper 3.9.0

2023-08-04 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version
3.9.0

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.9.0 Release Notes are at:
https://zookeeper.apache.org/doc/r3.9.0/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team




Re: ZK is binding an ephemeral port

2023-06-07 Thread Andor Molnar
I've dug some more info about this:

https://www.baeldung.com/jmx-ports

"First of all, we can disable exposing an application for local
connection from JConsole with the -XX:+DisableAttachMechanism option"

"Furthermore, starting from JDK 16, we can set the local port number"

"There's an additional option
-Dcom.sun.management.jmxremote.rmi.port=1234 that allows us to set the
RMI port to the same value as the JMX port."

Interesting. Shall we take advantage of any of these in ZooKeeper?

Andor





On Wed, 2023-06-07 at 11:20 +0200, Andor Molnar wrote:
> Confirmed on branch-3.5 too.
> 
> Thanks Enrico & Tison for the quick help. I'll close the ticket with
> the explanation.
> 
> Andor
> 
> 
> 
> On Wed, 2023-06-07 at 10:03 +0800, tison wrote:
> > Confirm with master.
> > 
> > With JMXDISABLE=1 bin/zkServer.sh start-foreground the extra port
> > doesn't
> > occur. So it should follow Enrico's analysis.
> > 
> > Best,
> > tison.
> > 
> > 
> > Enrico Olivelli  于2023年6月6日周二 22:39写道:
> > 
> > > Andor
> > > 
> > > "Log4j 1.2 jmx support not found; jmx disabled."
> > > 
> > > This means that "log4j" JMX support is disabled.
> > > 
> > > You wrote in the very first line of your logs
> > > "$ bin/zkServer.sh start-foreground
> > > ZooKeeper JMX enabled by default"
> > > 
> > > 
> > > Enrico
> > > 
> > > Il giorno mar 6 giu 2023 alle ore 16:00 Andor Molnar
> > >  ha scritto:
> > > > Looks like we already have a ticket about it:
> > > > 
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-2910
> > > > 
> > > > Sure, I'll try to repro on master in a minute.
> > > > 
> > > > 
> > > > 
> > > > On Tue, 2023-06-06 at 21:24 +0800, tison wrote:
> > > > > Can you reproduce this on master? Or only happen on 3.4 and
> > > > > 3.5?
> > > > > 
> > > > > Best,
> > > > > tison.
> > > > > 
> > > > > 
> > > > > Andor Molnar  于2023年6月6日周二 21:16写道:
> > > > > 
> > > > > > JMX is disabled by ManagedUtil:
> > > > > > 
> > > > > > 2023-06-06 14:56:48,384 [myid:] - INFO  [main:
> > > > > > ManagedUtil@47]
> > > > > > -
> > > > > > Log4j
> > > > > > 1.2 jmx support not found; jmx disabled.
> > > > > > 
> > > > > > Hence I don't see the 9010 tcp port bound. But why would a
> > > > > > random
> > > > > > ephemeral port still open?
> > > > > > 
> > > > > > Important that this port is random and keeps changing at
> > > > > > every
> > > > > > startup.
> > > > > > 
> > > > > > Andor
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Tue, 2023-06-06 at 15:08 +0200, Enrico Olivelli wrote:
> > > > > > > Andor,
> > > > > > > 
> > > > > > > Il giorno mar 6 giu 2023 alle ore 14:59 Andor Molnar
> > > > > > >  ha scritto:
> > > > > > > > Hi folks,
> > > > > > > > 
> > > > > > > > I cannot find an answer for this which annoys the hell
> > > > > > > > out of
> > > > > > > > me.
> > > > > > > > Please help to understand why ZooKeeper is binding a
> > > > > > > > local
> > > > > > > > ephemeral
> > > > > > > > port right after start without any quorum or client
> > > > > > > > socket
> > > > > > > > opened.
> > > > > > > > This
> > > > > > > > is what I see when start a 3.4 or 3.5 server is
> > > > > > > > standalone
> > > > > > > > mode.
> > > > > > > > (Same
> > > > > > > > behaviour is observed with quorum too)
> > > > > > > > 
> > > > > > > > $ bin/zkServer.sh start-foreground
> > > > > > > > ZooKeeper JMX enabled by default
> > > > > > > > Using config: /home/andor/git/my-
> > > > > > > > zookeeper/bin/../conf/zoo.cfg
> > > > > > > > 2023-06-06 14:56:48,378 [myid:] - INFO  [main:
> > > > > > > > QuorumPeerConfig@154]
> > > > > > > > -
> > > > > > > > Reading configuration from: /home/andor/git/my-
> > > > > > > > zookeeper/bin/../conf/zoo.cfg
> > > > > > > > ...
> > > > > > > > 2023-06-06 14:56:48,648 [myid:] - INFO  [main:
> > > > > > > > NIOServerCnxnFactory@689]
> > > > > > > > - binding to port 0.0.0.0/0.0.0.0:2181
> > > > > > > > 
> > > > > > > > ZooKeeper is running with PID 57126. Admin server is
> > > > > > > > disabled.
> > > > > > > > 
> > > > > > > > $ sudo netstat -plnt | grep 57126
> > > > > > > > tcp6   0  0
> > > > > > > > :::2181 :::*LISTEN 
> > > > > > > >   
> > > > > > > >571
> > > > > > > > 26/j
> > > > > > > > ava
> > > > > > > > tcp6   0  0
> > > > > > > > :::45669:::*LISTEN 
> > > > > > > >   
> > > > > > > >571
> > > > > > > > 26/j
> > > > > > > > ava
> > > > > > > > 
> > > > > > > > What is the second line??
> > > > > > > 
> > > > > > > It should be JMX
> > > > > > > 
> > > > > > > Enrico
> > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Andor
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 



Re: ZK is binding an ephemeral port

2023-06-07 Thread Andor Molnar
Confirmed on branch-3.5 too.

Thanks Enrico & Tison for the quick help. I'll close the ticket with
the explanation.

Andor



On Wed, 2023-06-07 at 10:03 +0800, tison wrote:
> Confirm with master.
> 
> With JMXDISABLE=1 bin/zkServer.sh start-foreground the extra port
> doesn't
> occur. So it should follow Enrico's analysis.
> 
> Best,
> tison.
> 
> 
> Enrico Olivelli  于2023年6月6日周二 22:39写道:
> 
> > Andor
> > 
> > "Log4j 1.2 jmx support not found; jmx disabled."
> > 
> > This means that "log4j" JMX support is disabled.
> > 
> > You wrote in the very first line of your logs
> > "$ bin/zkServer.sh start-foreground
> > ZooKeeper JMX enabled by default"
> > 
> > 
> > Enrico
> > 
> > Il giorno mar 6 giu 2023 alle ore 16:00 Andor Molnar
> >  ha scritto:
> > > Looks like we already have a ticket about it:
> > > 
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2910
> > > 
> > > Sure, I'll try to repro on master in a minute.
> > > 
> > > 
> > > 
> > > On Tue, 2023-06-06 at 21:24 +0800, tison wrote:
> > > > Can you reproduce this on master? Or only happen on 3.4 and
> > > > 3.5?
> > > > 
> > > > Best,
> > > > tison.
> > > > 
> > > > 
> > > > Andor Molnar  于2023年6月6日周二 21:16写道:
> > > > 
> > > > > JMX is disabled by ManagedUtil:
> > > > > 
> > > > > 2023-06-06 14:56:48,384 [myid:] - INFO  [main:ManagedUtil@47]
> > > > > -
> > > > > Log4j
> > > > > 1.2 jmx support not found; jmx disabled.
> > > > > 
> > > > > Hence I don't see the 9010 tcp port bound. But why would a
> > > > > random
> > > > > ephemeral port still open?
> > > > > 
> > > > > Important that this port is random and keeps changing at
> > > > > every
> > > > > startup.
> > > > > 
> > > > > Andor
> > > > > 
> > > > > 
> > > > > 
> > > > > On Tue, 2023-06-06 at 15:08 +0200, Enrico Olivelli wrote:
> > > > > > Andor,
> > > > > > 
> > > > > > Il giorno mar 6 giu 2023 alle ore 14:59 Andor Molnar
> > > > > >  ha scritto:
> > > > > > > Hi folks,
> > > > > > > 
> > > > > > > I cannot find an answer for this which annoys the hell
> > > > > > > out of
> > > > > > > me.
> > > > > > > Please help to understand why ZooKeeper is binding a
> > > > > > > local
> > > > > > > ephemeral
> > > > > > > port right after start without any quorum or client
> > > > > > > socket
> > > > > > > opened.
> > > > > > > This
> > > > > > > is what I see when start a 3.4 or 3.5 server is
> > > > > > > standalone
> > > > > > > mode.
> > > > > > > (Same
> > > > > > > behaviour is observed with quorum too)
> > > > > > > 
> > > > > > > $ bin/zkServer.sh start-foreground
> > > > > > > ZooKeeper JMX enabled by default
> > > > > > > Using config: /home/andor/git/my-
> > > > > > > zookeeper/bin/../conf/zoo.cfg
> > > > > > > 2023-06-06 14:56:48,378 [myid:] - INFO  [main:
> > > > > > > QuorumPeerConfig@154]
> > > > > > > -
> > > > > > > Reading configuration from: /home/andor/git/my-
> > > > > > > zookeeper/bin/../conf/zoo.cfg
> > > > > > > ...
> > > > > > > 2023-06-06 14:56:48,648 [myid:] - INFO  [main:
> > > > > > > NIOServerCnxnFactory@689]
> > > > > > > - binding to port 0.0.0.0/0.0.0.0:2181
> > > > > > > 
> > > > > > > ZooKeeper is running with PID 57126. Admin server is
> > > > > > > disabled.
> > > > > > > 
> > > > > > > $ sudo netstat -plnt | grep 57126
> > > > > > > tcp6   0  0
> > > > > > > :::2181 :::*LISTEN   
> > > > > > >571
> > > > > > > 26/j
> > > > > > > ava
> > > > > > > tcp6   0  0
> > > > > > > :::45669:::*LISTEN   
> > > > > > >571
> > > > > > > 26/j
> > > > > > > ava
> > > > > > > 
> > > > > > > What is the second line??
> > > > > > 
> > > > > > It should be JMX
> > > > > > 
> > > > > > Enrico
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Andor
> > > > > > > 
> > > > > > > 
> > > > > > > 



Re: ZK is binding an ephemeral port

2023-06-06 Thread Andor Molnar
Looks like we already have a ticket about it:

https://issues.apache.org/jira/browse/ZOOKEEPER-2910

Sure, I'll try to repro on master in a minute.



On Tue, 2023-06-06 at 21:24 +0800, tison wrote:
> Can you reproduce this on master? Or only happen on 3.4 and 3.5?
> 
> Best,
> tison.
> 
> 
> Andor Molnar  于2023年6月6日周二 21:16写道:
> 
> > JMX is disabled by ManagedUtil:
> > 
> > 2023-06-06 14:56:48,384 [myid:] - INFO  [main:ManagedUtil@47] -
> > Log4j
> > 1.2 jmx support not found; jmx disabled.
> > 
> > Hence I don't see the 9010 tcp port bound. But why would a random
> > ephemeral port still open?
> > 
> > Important that this port is random and keeps changing at every
> > startup.
> > 
> > Andor
> > 
> > 
> > 
> > On Tue, 2023-06-06 at 15:08 +0200, Enrico Olivelli wrote:
> > > Andor,
> > > 
> > > Il giorno mar 6 giu 2023 alle ore 14:59 Andor Molnar
> > >  ha scritto:
> > > > Hi folks,
> > > > 
> > > > I cannot find an answer for this which annoys the hell out of
> > > > me.
> > > > Please help to understand why ZooKeeper is binding a local
> > > > ephemeral
> > > > port right after start without any quorum or client socket
> > > > opened.
> > > > This
> > > > is what I see when start a 3.4 or 3.5 server is standalone
> > > > mode.
> > > > (Same
> > > > behaviour is observed with quorum too)
> > > > 
> > > > $ bin/zkServer.sh start-foreground
> > > > ZooKeeper JMX enabled by default
> > > > Using config: /home/andor/git/my-zookeeper/bin/../conf/zoo.cfg
> > > > 2023-06-06 14:56:48,378 [myid:] - INFO  [main:
> > > > QuorumPeerConfig@154]
> > > > -
> > > > Reading configuration from: /home/andor/git/my-
> > > > zookeeper/bin/../conf/zoo.cfg
> > > > ...
> > > > 2023-06-06 14:56:48,648 [myid:] - INFO  [main:
> > > > NIOServerCnxnFactory@689]
> > > > - binding to port 0.0.0.0/0.0.0.0:2181
> > > > 
> > > > ZooKeeper is running with PID 57126. Admin server is disabled.
> > > > 
> > > > $ sudo netstat -plnt | grep 57126
> > > > tcp6   0  0
> > > > :::2181 :::*LISTEN  571
> > > > 26/j
> > > > ava
> > > > tcp6   0  0
> > > > :::45669:::*LISTEN  571
> > > > 26/j
> > > > ava
> > > > 
> > > > What is the second line??
> > > 
> > > It should be JMX
> > > 
> > > Enrico
> > > 
> > > > Thanks,
> > > > Andor
> > > > 
> > > > 
> > > > 



Re: ZK is binding an ephemeral port

2023-06-06 Thread Andor Molnar
JMX is disabled by ManagedUtil:

2023-06-06 14:56:48,384 [myid:] - INFO  [main:ManagedUtil@47] - Log4j
1.2 jmx support not found; jmx disabled.

Hence I don't see the 9010 tcp port bound. But why would a random
ephemeral port still open?

Important that this port is random and keeps changing at every startup.

Andor



On Tue, 2023-06-06 at 15:08 +0200, Enrico Olivelli wrote:
> Andor,
> 
> Il giorno mar 6 giu 2023 alle ore 14:59 Andor Molnar
>  ha scritto:
> > Hi folks,
> > 
> > I cannot find an answer for this which annoys the hell out of me.
> > Please help to understand why ZooKeeper is binding a local
> > ephemeral
> > port right after start without any quorum or client socket opened.
> > This
> > is what I see when start a 3.4 or 3.5 server is standalone mode.
> > (Same
> > behaviour is observed with quorum too)
> > 
> > $ bin/zkServer.sh start-foreground
> > ZooKeeper JMX enabled by default
> > Using config: /home/andor/git/my-zookeeper/bin/../conf/zoo.cfg
> > 2023-06-06 14:56:48,378 [myid:] - INFO  [main:QuorumPeerConfig@154]
> > -
> > Reading configuration from: /home/andor/git/my-
> > zookeeper/bin/../conf/zoo.cfg
> > ...
> > 2023-06-06 14:56:48,648 [myid:] - INFO  [main:
> > NIOServerCnxnFactory@689]
> > - binding to port 0.0.0.0/0.0.0.0:2181
> > 
> > ZooKeeper is running with PID 57126. Admin server is disabled.
> > 
> > $ sudo netstat -plnt | grep 57126
> > tcp6   0  0
> > :::2181 :::*LISTEN  57126/j
> > ava
> > tcp6   0  0
> > :::45669:::*LISTEN  57126/j
> > ava
> > 
> > What is the second line??
> 
> It should be JMX
> 
> Enrico
> 
> > Thanks,
> > Andor
> > 
> > 
> > 



ZK is binding an ephemeral port

2023-06-06 Thread Andor Molnar
Hi folks,

I cannot find an answer for this which annoys the hell out of me.
Please help to understand why ZooKeeper is binding a local ephemeral
port right after start without any quorum or client socket opened. This
is what I see when start a 3.4 or 3.5 server is standalone mode. (Same
behaviour is observed with quorum too)

$ bin/zkServer.sh start-foreground
ZooKeeper JMX enabled by default
Using config: /home/andor/git/my-zookeeper/bin/../conf/zoo.cfg
2023-06-06 14:56:48,378 [myid:] - INFO  [main:QuorumPeerConfig@154] -
Reading configuration from: /home/andor/git/my-
zookeeper/bin/../conf/zoo.cfg
...
2023-06-06 14:56:48,648 [myid:] - INFO  [main:NIOServerCnxnFactory@689]
- binding to port 0.0.0.0/0.0.0.0:2181

ZooKeeper is running with PID 57126. Admin server is disabled.

$ sudo netstat -plnt | grep 57126
tcp6   0  0
:::2181 :::*LISTEN  57126/java
tcp6   0  0
:::45669:::*LISTEN  57126/java

What is the second line??

Thanks,
Andor





New releases page: endoflife.date

2022-07-27 Thread Andor Molnar
Hi ZK folks,

I'm lettig you know that I've added ZooKeeper to this page:

https://endoflife.date/zookeeper

It's pretty neat to track releases there. We don't need to manually
update it, because it monitors the tags on GitHub mirror page, so it
should be all automatic.

Hope you like it.

Andor





Re: Custom hostname when using quorum TLS

2022-06-03 Thread Andor Molnar
Hi Sam,

What do you mean by ‘present’?

AFAIK - I’m not 100% sure about this off the top of my head - ZK does not 
present its name to other quorum members, only the certificate. Something like:

1. ZK reads quorum members from zoo.cfg at startup,
2. It connects to all other nodes via TCP and present the certificate: here I 
am and that’s my ID.
3. Other quorum member will reverse lookup the DNS name from the IP and 
compares it with the hostname in the cert.

I need to dig this to be sure.

Andor




> On 2022. Mar 24., at 13:25, Sam Lee  wrote:
> 
> I have followed the steps in the documentation to set up "Quorum TLS"
> for encrypted communication between ZooKeeper nodes.
> ( https://zookeeper.apache.org/doc/r3.6.3/zookeeperAdmin.html#Quorum+TLS )
> 
> Now, I am looking to change the hostname that a ZooKeeper node
> presents to other ZooKeeper nodes. For example, my server's
> /etc/hostname is 'my-server' but I want the ZooKeeper instance running
> on that server to use another hostname instead (e.g. 'zoo1'). Is this
> possible?



Re: Question: zookeeper 3.8.0

2022-06-03 Thread Andor Molnar
Hi,

Not yet decided.
The most recent EoL announcement was for the 3.5 branch effective from 1st of 
June, 2022.

ZK 3.8.0 is the most recent stable release of ZooKeepeer, so it'll long be 
supported. Years.

Regards,
Andor




> On 2022. Jun 2., at 23:44, harsha vardhan  wrote:
> 
> Hi,
> 
> I would like to request about the end of vendor support date for apache
> zookeeper version 3.8.0.
> I have searched through documentation and release notes, but unable to find
> the EOVS date for zookeeper version 3.8.0.
> Please let me know.
> 
> Thanks and regards
> Harshavardhan



Re: Getting too many debug logs in zookeeper 3.8.0. How to change log level?

2022-04-22 Thread Andor Molnar
Hi Aman,

I did the following:

1. Downloaded apache-zookeeper-3.8.0-bin.tar.gz, verified sig, unpacked
2. Copied the sample config: cp conf/zoo_sample.cfg conf/zoo.cfg
3. Started the server: bin/zkServer.sh start-foreground

I didn’t see any DEBUG logs on the screen. The default logback.xml that we ship 
with ZooKeeper is in the conf directory.

Andor





> On 2022. Apr 22., at 8:51, Aman Jain  wrote:
> 
> I am running 3.8.0 version of Zookeeper that uses logback.
> Getting too many debug logs, In 7 hours it logged more than 300M worth of 
> logs.
> // ffe zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:58.325 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:38:58.325 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:58.325 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:59.659 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:38:59.659 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:59.659 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:59.869 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0024
> 11:38:59.869 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0024 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:38:59.869 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0024 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:00.993 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:39:00.993 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:00.993 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:02.327 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:39:02.327 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:02.327 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:03.661 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:39:03.661 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:03.661 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:04.995 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0035
> 11:39:04.995 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:04.995 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - 
> sessionid:0x1020a4e931f0035 type:ping cxid:0xfffe 
> zxid:0xfffe txntype:unknown reqpath:n/a
> 11:39:05.874 [ProcessThread(sid:0 cport:2181):] DEBUG 
> org.apache.zookeeper.server.SessionTrackerImpl - Checking session 
> 0x1020a4e931f0024
> 11:39:05.874 [SyncThread:0] DEBUG 
> org.apache.zookeeper.server.FinalRequestProcessor - Processing request:: 

Re: how use command deleteall option -b batch size??

2022-03-07 Thread Andor Molnar
I think it's already available even in 3.5.x versions.



On Mon, 2022-03-07 at 17:16 +0800, 一直以来 wrote:
> The command only support 3.8 version??3.7 not
> support???
> 
> 
> 
> 发自我的iPhone
> 
> 
> -- Original ------
> From: Andor Molnar  Date: Mon,Mar 7,2022 5:10 PM
> To: user  Subject: Re: how use command deleteall option -b batch size??
> 
> 
> 
> Hi,
> 
> "-b" option is the batch size.
> Here's the doc:
> https://zookeeper.apache.org/doc/r3.8.0/zookeeperCLI.html
> 
> Looks like this option is not documented, but it's outlined in the
> help:
> 
> deleteall path [-b batch size]
> 
> Regards,
> Andor
> 
> 
> 
> On Wed, 2022-02-16 at 17:19 +0800, 一直以来 wrote:
>  create /acreate /a/b
>  create /a/b/c
>  create /a/b/c/d
>  
>  
>  i run:
>  deleteall /a -b 1
>  
>  
>  result:
>  /a and all subNode delete!!
>  
>  
>  so , -b option how use ?thank you !




Re: how use command deleteall option -b batch size ??

2022-03-07 Thread Andor Molnar
Hi,

"-b" option is the batch size.
Here's the doc:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperCLI.html

Looks like this option is not documented, but it's outlined in the
help:

deleteall path [-b batch size]

Regards,
Andor



On Wed, 2022-02-16 at 17:19 +0800, 一直以来 wrote:
> create /acreate /a/b
> create /a/b/c
> create /a/b/c/d
> 
> 
> i run:
> deleteall /a -b 1
> 
> 
> result:
> /a and all subNode delete!!
> 
> 
> so , -b option how use ?thank you !




[ANNOUNCE] Apache ZooKeeper 3.5 End-of-Life 1st June, 2022

2022-03-03 Thread Andor Molnar
Hi,

The Apache ZooKeeper community would like to make the official announcement of
3.5 release line End-of-Life. It will be effective on 1st of June, 2022 00:01 AM
(PDT). From that day forward the 3.5 version of Apache ZooKeeper won’t be
supported by the community which means we won’t 

- accept patches on the 3.5.x branch,
- run automated tests on any JDK version,
- create new releases from 3.5.x branch,
- resolve security issues, CVEs or critical bugs.

Latest released version of Apache ZooKeeper 3.5 (currently 3.5.9) will be
available on the download page for another year (until 1st of June, 2023), after
that it will be accessible among other historical versions from Apache Archives.

=== Upgrade ===

We recommend users of Apache ZooKeeper 3.5 to plan your production upgrades
according to the following supported upgrade path:

1) Upgrade to latest 3.5.x version 
2) Upgrade to latest 3.6.x version
3) (Optional) Upgrade to latest 3.7.x version.

Please find known upgrade issues and workarounds on the following wiki page:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

In addition to that the user@ mailing list is open 24/7 to help and answer your
questions as usual.

=== Compatibility ===

Our backward compatibility rules still apply and can be found here:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

Following the recommended upgrade path with rolling upgrade process ZooKeeper
quorum will be available at all times as long as clients are not starting to use
new features.

Best Regards,

-Andor




Re: Kafka connect to zookeeper - Secure connection using cert and keyfile

2021-10-21 Thread Andor Molnar
Hi John,

I'm not familiar with how Kafka sets up the Zookeeper client
internally. Does it support TLS encryption already? Which version of
Kafka is this?

You need to put your certificates into Java keystores and pass the
location to ZK client which Kafka should do when TLS is enabled.

Best,
Andor



On Tue, 2021-08-17 at 07:42 +0800, john mark wrote:
> Hi,
> 
> I am running this command to test my zookeeper SSL connection:
> 
> openssl s_client -showcerts -connect 55.55.55.55:2280 -CAfile
> /certs/ca-chain.cert.pem -cert
> /root/ca/intermediate/certs/intermediate.cert.pem  -key
> /root/ca/intermediate/private/intermediate.key.pem
> 
> It works just fine so that's for openssl s_client to connect to
> zookeeper.
> 
> How can I connect my Kafka server using -cert and -key option like
> the
> command mentioned above?
> 
> I need to use that to avoid getting SSL errors because I cannot use
> ssl.clientAuth
> in my zookeeper config that is because I have a version 3.5.5 only
> (does
> not support ssl.clientAuth).
> 
> Any ideas on how can I connect my Kafka server using -cert and -key
> option?
> 
> Best regards,
> 
> John Mark Causing




Re: Quorum TLS encryption issue - IP used instead of DNS name

2021-10-21 Thread Andor Molnar


Hi Marc,

I need to take a closer look.
Would you please share how have you requested the certificates from
Let's Encrypt?
Is this an IPv6-only environment? Do those hostnames resolve only to
IPv6 addresses?

Regards,
Andor



On Wed, 2021-10-13 at 15:47 +0200, Marc Richter wrote:
> Hi everyone,
> 
> for some days now, I am trying to wrap my head around TLS encryption
> for the quorum-traffic. The hosts running Zookeeper do have a
> publicly available DNS name and I am using those to issue SSL
> certificates from Let's Encrypt.
> This seems to work - but it seems like Zookeeper decides to validate
> the SSL certificates against the IP(v6) of the connecting nodes
> instead of their hostnames.
> 
> In the `zookeeper.properties` of all my 3 nodes, I have set the
> servers by their DNS names like this:
> 
> ```
> server.1=zookeeper1.ourdomain.cloud:2888:3888
> server.2=zookeeper2.ourdomain.cloud:2888:3888
> server.3=zookeeper3.ourdomain.cloud:2888:3888
> ```
> 
> I requested SSL certificates from Let's Encrypt for these DNS names
> and added the certificate/key pairs to the Keystores of the nodes.
> 
> In the logs of the `zookeeper2` node, I now see something like this
> when the `zookeeper3` node tries to connect:
> 
> ```
> [2021-10-13 15:13:49,960] INFO Received connection request from
> /2a01:--CUT--:750:47566
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
> [2021-10-13 15:13:50,094] ERROR Failed to verify host address: 2a01:-
> -CUT--:750 (org.apache.zookeeper.common.ZKTrustManager)
> javax.net.ssl.SSLPeerUnverifiedException: Certificate for <2a01:--
> CUT--:750> doesn't match any of the subject alternative names:
> [zookeeper3.ourdomain.cloud]
> ```
> 
> Zookeeper seems to ignore the hostnames and complains about that the
> IPv6 is not listed in the SNA of the presented certificate. Since
> most open CAs do not sign IP addresses (Let's Encrypt does not do
> that at all, ZeroSSL only for http auth, etc.), this behaviour
> enforces me to have an internal CA and work with self signed
> certificates; including all the negative things that come with it and
> a lot of extra effort.
> 
> How can I make Zookeeper to resolve this correctly?
> 
> Best regards,
> Marc




Re: Apache ZooKeeper Meets the Dining Philosophers!

2021-06-25 Thread Andor Molnar
Hi Paul,

That’s a great article, thanks for sharing it.
I added the link to our Wiki 
here:https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeperArticles

Thanks

Andor



> On 2021. Jun 16., at 3:56, Paul Brebner  wrote:
> 
> Hi Zookeeper people, I recently wrote a blog on Apache Zookeeper and
> Curator (used to solve the classic Dining Philosophers problem), and
> thought that it may be of interest to the wider Apache community? Here's
> the current link:
> 
> https://www.instaclustr.com/apache-zookeeper-meets-the-dining-philosophers/
> 
> If you would like to republish it on the Zookeeper site, or link to it etc
> please let me know,
> 
> Regards, Paul Brebner (Instaclustr Tech Evangelist)



Re: write performance issue in 3.6.2

2021-04-23 Thread Andor Molnar
Hi folks,

As previously mentioned the community won’t be able to help if you don’t share 
more information about your scenario. We need to see the following:

- which version of Zookeeper is being used,
- how many nodes are you running in the ZK cluster,
- what is the server configuration? any custom setting is in place?
- what is the hardware and software setup? on-prem or cloud? instance type? 
CPU, memory, disk properties, operating system, etc.
- network characteristics
- how many clients are connected and what are they doing? share the relevant 
source code of your client or the command that you’re running,
- 3.6 has advanced monitoring capabilities, setup Prometheus and share 
screenshots of relevant metrics
- server and client logs, debug enabled if possible,
- security settings: TLS, Kerberos, etc.
- ...anything else which could be important

In a nutshell, either you have to share information about your production 
system or provide a reproduction setup. Performance issues are pretty hard to 
resolve, because of the so many moving parts. The community is willing to help, 
but you need to share information to be successful.

shrikant,
ZK 3.6 has throttling for both client connections and requests. Request 
throttling can be disabled and it’s disabled by default, but connection 
throttling is not. From the log messages we can tell which throttling is in 
effect for your scenario.

Regards,
Andor



> On 2021. Apr 21., at 5:25, shrikant kalani  wrote:
> 
> Hello Everyone,
> 
> We are also using zookeeper 3.6.2 with ssl turned on both sides. We
> observed the same behaviour where under high write load the ZK server
> starts expiring the session. There are no jvm related issues. During high
> load the max latency increases significantly.
> 
> Also the session expiration message is not accurate. We do have session
> expiration set to 40 sec but ZK server disconnects the client within 10 sec.
> 
> Also the logs prints throttling the request but ZK documentation says
> throttling is disabled by default. Can someone check the code once to see
> if it is enabled or disabled. I am not a developer and hence not familiar
> with java code.
> 
> Thanks
> Srikant Kalani
> 
> On Wed, 21 Apr 2021 at 11:03 AM, Michael Han  wrote:
> 
>> What is the workload looking like? Is it pure write, or mixed read write?
>> 
>> A couple of ideas to move this forward:
>> * Publish the performance benchmark so the community can help.
>> * Bisect git commit and find the bad commit that caused the regression.
>> * Use the fine grained metrics introduced in 3.6 (e.g per processor stage
>> metrics) to measure where time spends during writes. We might have to add
>> these metrics on 3.4 to get a fair comparison.
>> 
>> For the throttling - the RequestThrottler introduced in 3.6 does introduce
>> latency, but should not impact throughput this much.
>> 
>> On Thu, Mar 11, 2021 at 11:46 AM Li Wang  wrote:
>> 
>>> The CPU usage of both server and client are normal (< 50%) during the
>> test.
>>> 
>>> Based on the investigation, the server is too busy with the load.
>>> 
>>> The issue doesn't exist in 3.4.14. I wonder why there is a significant
>>> write performance degradation from 3.4.14 to 3.6.2 and how we can address
>>> the issue.
>>> 
>>> Best,
>>> 
>>> Li
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Mar 11, 2021 at 11:25 AM Andor Molnar  wrote:
>>> 
>>>> What is the CPU usage of both server and client during the test?
>>>> 
>>>> Looks like server is dropping the clients because either the server or
>>>> both are too busy to deal with the load.
>>>> This log line is also concerning: "Too busy to snap, skipping”
>>>> 
>>>> If that’s the case I believe you'll have to profile the server process
>> to
>>>> figure out where the perf bottleneck is.
>>>> 
>>>> Andor
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 2021. Feb 22., at 5:31, Li Wang  wrote:
>>>>> 
>>>>> Thanks, Patrick.
>>>>> 
>>>>> Yes, we are using the same JVM version and GC configurations when
>>>>> running the two tests. I have checked the GC metrics and also the
>> heap
>>>> dump
>>>>> of the 3.6, the GC pause and the memory usage look okay.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Li
>>>>> 
>>>>> On Sun, Feb 21, 2021

Re: write performance issue in 3.6.2

2021-03-11 Thread Andor Molnar
What is the CPU usage of both server and client during the test?

Looks like server is dropping the clients because either the server or both are 
too busy to deal with the load.
This log line is also concerning: "Too busy to snap, skipping”

If that’s the case I believe you'll have to profile the server process to 
figure out where the perf bottleneck is.

Andor




> On 2021. Feb 22., at 5:31, Li Wang  wrote:
> 
> Thanks, Patrick.
> 
> Yes, we are using the same JVM version and GC configurations when
> running the two tests. I have checked the GC metrics and also the heap dump
> of the 3.6, the GC pause and the memory usage look okay.
> 
> Best,
> 
> Li
> 
> On Sun, Feb 21, 2021 at 3:34 PM Patrick Hunt  wrote:
> 
>> On Sun, Feb 21, 2021 at 3:28 PM Li Wang  wrote:
>> 
>>> Hi Enrico, Sushant,
>>> 
>>> I re-run the perf test with the data consistency check feature disabled
>>> (i.e. -Dzookeeper.digest.enabled=false), the write performance issue of
>> 3.6
>>> is still there.
>>> 
>>> With everything exactly the same, the throughput of 3.6 was only 1/2 of
>> 3.4
>>> and the max latency was more than 8 times.
>>> 
>>> Any other points or thoughts?
>>> 
>>> 
>> In the past I've noticed a big impact of GC when doing certain performance
>> measurements. I assume you are using the same JVM version and GC when
>> running the two tests? Perhaps our memory footprint has expanded over time.
>> You should rule out GC by running with gc logging turned on with both
>> versions and compare the impact.
>> 
>> Regards,
>> 
>> Patrick
>> 
>> 
>>> Cheers,
>>> 
>>> Li
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Feb 20, 2021 at 9:04 PM Li Wang  wrote:
>>> 
 Thanks Sushant and Enrico!
 
 This is a really good point.  According to the 3.6 documentation, the
 feature is disabled by default.
 
>>> 
>> https://zookeeper.apache.org/doc/r3.6.2/zookeeperAdmin.html#ch_administration
>>> .
 However, checking the code, the default is enabled.
 
 Let me set the zookeeper.digest.enabled to false and see how the write
 operation performs.
 
 Best,
 
 Li
 
 
 
 
 On Fri, Feb 19, 2021 at 1:32 PM Sushant Mane 
 wrote:
 
> Hi Li,
> 
> On 3.6.2 consistency checker (adhash based) is enabled by default:
> 
> 
>>> 
>> https://github.com/apache/zookeeper/blob/803c7f1a12f85978cb049af5e4ef23bd8b688715/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L136
> .
> It is not present in ZK 3.4.14.
> 
> This feature does have some impact on write performance.
> 
> Thanks,
> Sushant
> 
> 
> On Fri, Feb 19, 2021 at 12:50 PM Enrico Olivelli >> 
> wrote:
> 
>> Li,
>> I wonder of we have some new throttling/back pressure mechanisms
>> that
>>> is
>> enabled by default.
>> 
>> Does anyone has some pointer to relevant implementations?
>> 
>> 
>> Enrico
>> 
>> Il Ven 19 Feb 2021, 19:46 Li Wang  ha scritto:
>> 
>>> Hi,
>>> 
>>> We switched to Netty on both client side and server side and the
>>> performance issue is still there.  Anyone has any insights on what
> could
>> be
>>> the cause of higher latency?
>>> 
>>> Thanks,
>>> 
>>> Li
>>> 
>>> 
>>> 
>>> On Mon, Feb 15, 2021 at 2:17 PM Li Wang 
>> wrote:
>>> 
 Hi Enrico,
 
 
 Thanks for the reply.
 
 
 1. We are using NIO based stack, not Netty based yet.
 
 2. Yes, here are some metrics on the client side.
 
 
 3.6: throughput: 7K, failure: 81215228, Avg Latency: 57ms,  Max
> Latency
>>> 31s
 
 3.4: throughput: 15k, failure: 0,  Avg Latency: 30ms,  Max
>>> Latency:
>> 1.6s
 
 
 3. Yes, the JVM and zoo.cfg config are the exact same
 
 10G of Heap
 
 13G of Memory
 
 5 Participante
 
 5 Observere
 
 Client session timeout: 3000ms
 
 Server min session time: 4000ms
 
 
 
 4. Yes, there are two types of  WARN logs and many “Expiring
> session”
 INFO log
 
 
 2021-02-15 22:04:36,506 [myid:4] - WARN
 [NIOWorkerThread-7:NIOServerCnxn@365] - Unexpected exception
 
 EndOfStreamException: Unable to read additional data from
>> client,
>>> it
 probably closed the socket: address = /100.108.63.116:43366,
> session =
 0x400189fee9a000b
 
 at
 
>>> 
>> 
> 
>>> 
>> org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:164)
 
 at
>> 
>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:327)
 
 at
 
>>> 
>> 
> 
>>> 
>> 

Re: ZOOKEEPER-1634: hardening security by teaching server to enforce client authentication

2020-05-05 Thread Andor Molnar
Oh, that’s great!

Have you enabled ‘sessionRequireClientSASLAuth’?

Andor



> On 2020. May 5., at 17:59, Enrico Olivelli  wrote:
> 
> Andor
> With ZOOKEEPER-1634 it has been implemented
> 
> I think it was a contribution from Twitter
> 
> Enrico
> 
> Il Mar 5 Mag 2020, 17:51 Andor Molnar  ha scritto:
> 
>> Authentication cannot be enforced currently.
>> What you can do is to enable SASL on one hand and set up ACLs for znodes
>> to be protected on the other hand.
>> 
>> Andor
>> 
>> 
>> 
>>> On 2020. May 5., at 17:25, Guilherme Ramos  wrote:
>>> 
>>> Hello.
>>> As requested by Enrico Olivelli, I'm trying to use  Zookeeper to 3.6.1
>>> using SASL, jaas(with Server defined) and still accepting client
>>> connections without credentials.
>>> 
>>> What can be done here? Is there a roadmap to fix this?
>>> 
>>> Thank you.
>> 
>> 



Re: ZOOKEEPER-1634: hardening security by teaching server to enforce client authentication

2020-05-05 Thread Andor Molnar
Authentication cannot be enforced currently.
What you can do is to enable SASL on one hand and set up ACLs for znodes to be 
protected on the other hand.

Andor



> On 2020. May 5., at 17:25, Guilherme Ramos  wrote:
> 
> Hello.
> As requested by Enrico Olivelli, I'm trying to use  Zookeeper to 3.6.1
> using SASL, jaas(with Server defined) and still accepting client
> connections without credentials.
> 
> What can be done here? Is there a roadmap to fix this?
> 
> Thank you.



Re: ZK not starting during upgrade to use 3.6.1 with SSL communication

2020-05-02 Thread Andor Molnar
Please include server logs.

Andor



> On 2020. May 2., at 1:36, blb.dev  wrote:
> 
> Hi, during testing upgrade to 3.6.1 version and using secure quorum ssl
> communication my zookeeper is not starting up. 
> 
> *config file:*
> dataDir=/data
> dataLogDir=/datalog
> tickTime=2000
> initLimit=10
> syncLimit=5
> maxClientCnxns=0
> autopurge.snapRetainCount=10
> autopurge.purgeInterval=24
> admin.enableServer=false
> snapshot.trust.empty=true
> reconfigEnabled=true
> audit.enable=true
> clientPort=2181
> secureClientPort=2281
> sslQuorum=true
> portUnification=true
> serverCnxnFactory=org.apache.zookeeper.server.NettyServerCnxnFactory
> client.portUnification=true
> clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> ssl.client.enable=true
> clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
> ssl.quorum.keyStore.location=/apache-zookeeper-3.6.1-bin/java/node1.ks
> ssl.quorum.keyStore.password=
> ssl.quorum.trustStore.location=/apache-zookeeper-3.6.1-bin/java/truststore.ks
> ssl.quorum.trustStore.password=
> 4lw.commands.whitelist=*
> quorumListenOnAllIPs=true
> server.2=:2888:3888:participant
> server.3=:2888:3888:participant
> server.1=:2888:3888:participant
> 
> *# bin/zkServer.sh status*
> /usr/bin/java
> ZooKeeper JMX enabled by default
> Using config: /conf/zoo.cfg
> Client port found: 2181. Client address: localhost.
> Error contacting service. It is probably not running.
> 
> *# cat /logs/zookeeper_audit.log  *
> 2020-05-01 22:06:12,213 INFO audit.Log4jAuditLogger: user=zookeeper
> operation=serverStart result=success
> 2020-05-01 23:08:30,859 INFO audit.Log4jAuditLogger: user=zookeeper
> operation=serverStart result=success
> 
> *# bin/zkCli.sh -server 127.0.0.1:2181*
> /usr/bin/java
> Connecting to 127.0.0.1:2181
> 2020-05-01 23:19:20,035 [myid:] - INFO  [main:Environment@98] - Client
> environment:zookeeper.version=3.6.1--104dcb3e3fb464b30c5186d229e00af9f332524b,
> built on 04/21/2020 15:01 GMT
> 2020-05-01 23:19:20,039 [myid:] - INFO  [main:Environment@98] - Client
> environment:host.name=zoo1
> 2020-05-01 23:19:20,039 [myid:] - INFO  [main:Environment@98] - Client
> environment:java.version=1.8.0_252
> 2020-05-01 23:19:20,042 [myid:] - INFO  [main:Environment@98] - Client
> environment:java.vendor=Oracle Corporation
> 2020-05-01 23:19:20,042 [myid:] - INFO  [main:Environment@98] - Client
> environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre
> 2020-05-01 23:19:20,042 [myid:] - INFO  [main:Environment@98] - Client
> 

[ANNOUNCE] Apache ZooKeeper 3.4 End-of-Life 1st June, 2020

2020-04-09 Thread Andor Molnar
Hi,

The Apache ZooKeeper community would like to make the official announcement of 
3.4 release line End-of-Life. It will be effective on 1st of June, 2020 00:01 
AM (PDT). From that day forward the 3.4 version of Apache ZooKeeper won’t be 
supported by the community which means we won’t 

- accept patches on the 3.4.x branch,
- run automated tests on any JDK version,
- create new releases from 3.4.x branch,
- resolve security issues, CVEs or critical bugs.

Latest released version of Apache ZooKeeper 3.4 (currently 3.4.14) will be 
available on the download page for another year (until 1st of June, 2021), 
after that it will be accessible among other historical versions from Apache 
Archives.

=== Upgrade ===

We recommend users of Apache ZooKeeper 3.4 to plan your production upgrades 
according to the following supported upgrade path:

1) Upgrade to latest 3.4.x version 
2) Upgrade to latest 3.5.x version
3) (Optional) Upgrade to latest 3.6.x version.

Please find known upgrade issues and workarounds on the following wiki page:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

In addition to that the user@ mailing list is open 24/7 to help and answer your 
questions as usual.

=== Compatibility ===

Our backward compatibility rules still apply and can be found here: 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

Following the recommended upgrade path with rolling upgrade process ZooKeeper 
quorum will be available at all times as long as clients are not starting to 
use new features.

Best Regards,

-Andor




[ANNOUNCE] New ZooKeeper committer: Mate Szalay-Beko

2020-04-03 Thread Andor Molnar
The Apache ZooKeeper PMC recently extended committer karma to Mate and he has 
accepted. 
Mate has made some great contributions (including C client!) and we are looking 
forward to even more. :) 

Congratulations and welcome aboard, Mate!




Re: systemd failed to stop zookeeper-server.

2020-03-09 Thread Andor Molnar
Hi,

Strange. It’s actually normal that ZooKeeper process gets stopped with the kill 
signal. That’s "business as usual”, but why does systemd have problems with 
that and since when has this been happening?

Andor



> On 2020. Mar 4., at 15:44, Anastasios Lisgaras 
>  wrote:
> 
> Hello zookeeper community,
> 
> After the last system update I had a problem with the zookeeper-server.
> The problem was about the same as this:
> https://issues.apache.org/jira/browse/BIGTOP-3302
> So, I went to the file :
> /etc/init.d/zookeeper-server
> 
> and I replaced "su" with "runuser". If you want to see it :
> 
> # cat /etc/init.d/zookeeper-server : https://termbin.com/gupi
> # cat /usr/lib/zookeeper/bin/zkServer.sh : https://termbin.com/3xpb
> 
> After this change the server runs flawlessly, but when i try to stop it
> via "systemctl" the system (systemd) complains.
> 
> ```
> # systemctl status zookeeper-server
> ● zookeeper-server.service - LSB: ZooKeeper is a centralized service for
> maintaining configuration information, naming, providing distributed
> synchronization, and providing group services.
>   Loaded: loaded (/etc/rc.d/init.d/zookeeper-server; bad; vendor
> preset: disabled)
>   Active: active (running) since Wed 2020-03-04 16:16:51 EET; 2s ago
> Docs: man:systemd-sysv-generator(8)
>  Process: 30509 ExecStart=/etc/rc.d/init.d/zookeeper-server start
> (code=exited, status=0/SUCCESS)
> Main PID: 30557 (java)
>   CGroup: /system.slice/zookeeper-server.service
>   └─30557 /usr/lib/jvm/jre-openjdk/bin/java
> -Dzookeeper.datadir.autocreate=false
> -Dzookeeper.log.dir=/var/log/zookeeper -Dzookee...
> 
> Mar 04 16:16:50 myserver systemd[1]: Starting LSB: ZooKeeper is a
> centralized service for maintaining configuration information,...ices
> Mar 04 16:16:50 myserver runuser[30541]: pam_unix(runuser:session):
> session opened for user zookeeper by (uid=0)
> Mar 04 16:16:50 myserver zookeeper-server[30509]: JMX enabled by default
> Mar 04 16:16:50 myserver zookeeper-server[30509]: Using config:
> /etc/zookeeper/conf/zoo.cfg
> Mar 04 16:16:51 myserver zookeeper-server[30509]: Starting zookeeper ...
> STARTED
> Mar 04 16:16:51 myserver runuser[30541]: pam_unix(runuser:session):
> session closed for user zookeeper
> Mar 04 16:16:51 myserver systemd[1]: Started LSB: ZooKeeper is a
> centralized service for maintaining configuration information, ...rvices..
> Hint: Some lines were ellipsized, use -l to show in full.
> 
> 
> # systemctl stop zookeeper-server
> 
> 
> # systemctl status zookeeper-server -l
> ● zookeeper-server.service - LSB: ZooKeeper is a centralized service for
> maintaining configuration information, naming, providing distributed
> synchronization, and providing group services.
>   Loaded: loaded (/etc/rc.d/init.d/zookeeper-server; bad; vendor
> preset: disabled)
>   Active: failed (Result: signal) since Wed 2020-03-04 16:17:00 EET; 4s ago
> Docs: man:systemd-sysv-generator(8)
>  Process: 30605 ExecStop=/etc/rc.d/init.d/zookeeper-server stop
> (code=exited, status=0/SUCCESS)
>  Process: 30509 ExecStart=/etc/rc.d/init.d/zookeeper-server start
> (code=exited, status=0/SUCCESS)
> Main PID: 30557 (code=killed, signal=KILL)
> 
> Mar 04 16:16:51 myserver systemd[1]: Started LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services..
> Mar 04 16:17:00 myserver systemd[1]: Stopping LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services
> Mar 04 16:17:00 myserver runuser[30642]: pam_unix(runuser:session):
> session opened for user zookeeper by (uid=0)
> Mar 04 16:17:00 myserver zookeeper-server[30605]: JMX enabled by default
> Mar 04 16:17:00 myserver zookeeper-server[30605]: Using config:
> /etc/zookeeper/conf/zoo.cfg
> Mar 04 16:17:00 myserver zookeeper-server[30605]: Stopping zookeeper ...
> STOPPED
> Mar 04 16:17:00 myserver systemd[1]: zookeeper-server.service: main
> process exited, code=killed, status=9/KILL
> Mar 04 16:17:00 myserver systemd[1]: Stopped LSB: ZooKeeper is a
> centralized service for maintaining configuration information, naming,
> providing distributed synchronization, and providing group services..
> Mar 04 16:17:00 myserver systemd[1]: Unit zookeeper-server.service
> entered failed state.
> Mar 04 16:17:00 myserver systemd[1]: zookeeper-server.service failed.
> 
> 
> # systemctl is-active zookeeper-server
> failed
> 
> 
> # systemctl is-failed zookeeper-server
> failed
> 
> 
> # ps -aux | grep zookeeper
> root 31181  0.0  0.0 112712   968 pts/0S+   16:21   0:00 grep
> --color=auto zookeeper
> 
> 
> # systemctl restart zookeeper-server -l
> 
> 
> # systemctl status zookeeper-server -l
> ● zookeeper-server.service - LSB: ZooKeeper is a centralized service for
> maintaining configuration information, naming, providing distributed
> 

Re: Zookeeper resolving to old host IP addresses

2020-01-22 Thread Andor Molnar
Yep, give it a try.

https://cr.openjdk.java.net/~iris/se/11/latestSpec/api/java.base/java/net/doc-files/net-properties.html

I don't have personal experience with these settings.

Andor




On Wed, 2020-01-22 at 10:37 -0800, rammohan ganapavarapu wrote:
> Hi Andor,
> 
> On OS side the  hostname resolves to new IP, so it could be JVM is
> the one
> caching. Any setting son jvm to invalidate cache? In some other posts
> i did
> see some one recommending these but not sure it it works.
> 
> -Dnetworkaddress.cache.ttl=0
> 
> -Dnetworkaddress.cache.negative.ttl=0
> 
> Ram
> 
> On Wed, Jan 22, 2020 at 2:03 AM Andor Molnar 
> wrote:
> 
> > Hi Ram,
> > 
> > As far as I can see from the code, ZooKeeper uses the standard Java
> > calls
> > getByName() and getAllByName() every time it’s trying to connect to
> > a
> > server.
> > 
> > 
> > // zookeeper.ipReachableTimeout is not defined
> > if (ipReachableTimeout <= 0) {
> > address = InetAddress.getByName(this.hostname);
> > } else {
> > address = getReachableAddress(this.hostname,
> > ipReachableTimeout);
> > }
> > 
> > 
> > ZK doesn’t (and definitely should not) cache IP addresses. It’s
> > either the
> > cache of JVM or your DNS server.
> > 
> > Dynamic reconfig is available in 3.5.x versions which are already
> > stable
> > now and I think with that you don’t need to reuse existing
> > hostnames.
> > Instead use reconfig commands to properly remove old nodes and add
> > new
> > ones. Sounds like more cumbersome, but maybe more reliable.
> > 
> > Andor
> > 
> > 
> > 
> > 
> > > On 2020. Jan 21., at 23:14, rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> > > But still happening for me, is there any config on zookeeper side
> > > to make
> > > this fix to work?
> > > 
> > > Ram
> > > 
> > > On Tue, Jan 21, 2020 at 2:12 PM Michael Han 
> > > wrote:
> > > 
> > > > Could be ZOOKEEPER-1506, though this should be fixed already in
> > > > 3.4.14.
> > > > 
> > > > On Tue, Jan 21, 2020 at 2:01 PM rammohan ganapavarapu <
> > > > rammohanga...@gmail.com> wrote:
> > > > 
> > > > > Hi Enrico,
> > > > > 
> > > > > I see same with both 3.4.5 and 3.4.14
> > > > > 
> > > > > Ram
> > > > > 
> > > > > On Tue, Jan 21, 2020 at 1:53 PM Enrico Olivelli <
> > > > > eolive...@gmail.com>
> > > > > wrote:
> > > > > 
> > > > > > Hi,
> > > > > > Which version of ZK are you using?
> > > > > > Enrico
> > > > > > 
> > > > > > 
> > > > > > Il mar 21 gen 2020, 22:51 rammohan ganapavarapu <
> > > > rammohanga...@gmail.com
> > > > > > ha scritto:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Does zookeeper cache the host IP? if so how long does it
> > > > > > > cache? I
> > > > have
> > > > > a
> > > > > > > zk cluster in autoscaling groups and when a new node
> > > > > > > comes up, other
> > > > > > nodes
> > > > > > > still resolving to old IP. Is there any setting to
> > > > > > > invalidate dns
> > > > cache
> > > > > > for
> > > > > > > zookeeper? or is it jvm dns cache? until other nodes
> > > > > > > resolves to new
> > > > > IP,
> > > > > > > this node not able to join the cluster.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Ram
> > > > > > > 



Re: Zookeeper resolving to old host IP addresses

2020-01-22 Thread Andor Molnar
Hi Ram,

As far as I can see from the code, ZooKeeper uses the standard Java calls 
getByName() and getAllByName() every time it’s trying to connect to a server.


// zookeeper.ipReachableTimeout is not defined
if (ipReachableTimeout <= 0) {
address = InetAddress.getByName(this.hostname);
} else {
address = getReachableAddress(this.hostname, ipReachableTimeout);
}


ZK doesn’t (and definitely should not) cache IP addresses. It’s either the 
cache of JVM or your DNS server.

Dynamic reconfig is available in 3.5.x versions which are already stable now 
and I think with that you don’t need to reuse existing hostnames. Instead use 
reconfig commands to properly remove old nodes and add new ones. Sounds like 
more cumbersome, but maybe more reliable.

Andor




> On 2020. Jan 21., at 23:14, rammohan ganapavarapu  
> wrote:
> 
> But still happening for me, is there any config on zookeeper side to make
> this fix to work?
> 
> Ram
> 
> On Tue, Jan 21, 2020 at 2:12 PM Michael Han  wrote:
> 
>> Could be ZOOKEEPER-1506, though this should be fixed already in 3.4.14.
>> 
>> On Tue, Jan 21, 2020 at 2:01 PM rammohan ganapavarapu <
>> rammohanga...@gmail.com> wrote:
>> 
>>> Hi Enrico,
>>> 
>>> I see same with both 3.4.5 and 3.4.14
>>> 
>>> Ram
>>> 
>>> On Tue, Jan 21, 2020 at 1:53 PM Enrico Olivelli 
>>> wrote:
>>> 
 Hi,
 Which version of ZK are you using?
 Enrico
 
 
 Il mar 21 gen 2020, 22:51 rammohan ganapavarapu <
>> rammohanga...@gmail.com
 
 ha scritto:
 
> Hi,
> 
> Does zookeeper cache the host IP? if so how long does it cache? I
>> have
>>> a
> zk cluster in autoscaling groups and when a new node comes up, other
 nodes
> still resolving to old IP. Is there any setting to invalidate dns
>> cache
 for
> zookeeper? or is it jvm dns cache? until other nodes resolves to new
>>> IP,
> this node not able to join the cluster.
> 
> Thanks,
> Ram
> 
 
>>> 
>> 



Re: [ANNOUNCE] Enrico Olivelli new ZooKeeper PMC Member

2020-01-22 Thread Andor Molnar
Hi Enrico,

Repeating myself, but again: Congratulations! :-)

Andor



> On 2020. Jan 22., at 6:45, Szalay-Bekő Máté  
> wrote:
> 
> Congratulations! :)
> 
> On Wed, Jan 22, 2020, 06:36 David Mollitor  wrote:
> 
>> You've been a great help to me.  Well deserved!
>> 
>> On Wed, Jan 22, 2020, 12:09 AM Mohammad arshad >> 
>> wrote:
>> 
>>> Congratulations Enrico!
>>> 
>>> 
>>> -Original Message-
>>> From: rammohan ganapavarapu [mailto:rammohanga...@gmail.com]
>>> Sent: Wednesday, January 22, 2020 3:45 AM
>>> To: user@zookeeper.apache.org
>>> Cc: DevZooKeeper 
>>> Subject: Re: [ANNOUNCE] Enrico Olivelli new ZooKeeper PMC Member
>>> 
>>> Congratulations Enrico!!
>>> 
>>> On Tue, Jan 21, 2020 at 1:41 PM Flavio Junqueira  wrote:
>>> 
 I'm pleased to announce that Enrico Olivelli recently became the
 newest ZooKeeper PMC member. Enrico has contributed immensely to this
 community; he became a ZooKeeper committer in May 2019 and now he joins
>>> the PMC.
 
 Join me in congratulating him on the achievement. Congrats, Enrico!
 
 -Flavio on behalf of the Apache ZooKeeper PMC
>>> 
>> 



Re: Admin server deadlocks?

2020-01-21 Thread Andor Molnar
Hi Craig!

That's very good to know, thanks for reporting!
Enrico, I think I'll step up for RM of 3.5.7 this week. Hope I can find
some free cycles. Stay tuned.

Andor



On Tue, 2020-01-21 at 14:49 +, Craig.Condit wrote:
> I don’t have stack traces handy, but was able to eliminate the
> problem in our environment by updating Jetty to 9.4.24 (ZOOKEEPER-
> 3638 backport).
> 
> Craig
> 
> From: "Chris T." 
> Reply-To: "user@zookeeper.apache.org" 
> Date: Tuesday, January 21, 2020 at 5:09 AM
> To: "user@zookeeper.apache.org" 
> Subject: Re: Admin server deadlocks?
> 
> Yes, attached.
> 
> On Tue, Jan 21, 2020 at 6:19 AM Enrico Olivelli  mailto:eolive...@gmail.com>> wrote:
> Do you have a dump of the stack traces if the JVM during the problem?
> 
> 
> Enrico
> 
> Il lun 20 gen 2020, 21:06 Cee Tee  c.turks...@gmail.com>> ha scritto:
> 
> > Hey that sounds familiar, we do curl to the stats api and they
> > pretty
> > consistently start to hang after a while (minutes to hours).
> > I thought it was something in our environment but I'm happy to read
> > I'm
> > not
> > the only one. :)
> > I used to run a snapshot 3.6 that didn't suffer from this. It
> > appeared
> > after 'upgrading' to the official 356.
> > Regards
> > Chris
> > 
> > On 20 January 2020 20:54:51 Craig.Condit  > ailto:craig.con...@target.com>> wrote:
> > 
> > > We have been running Zookeeper 3.5.6 on several clusters for a
> > > while
> > now,
> > > and have noticed (pretty consistently) that the new Admin Server
> > > seems
> > to
> > > stop responding (hangs) after the ZK service has been up and
> > > running for
> > a
> > > while. The stack dumps we have done seem to indicate some sort of
> > > lock
> > > being held, probably by Jetty. This makes sense, as when we see
> > > the hang
> > > start, it happens for all URLs (even the root / 404 not found
> > > page).
> > > 
> > > 
> > > Has anyone encountered this? I was not able to find a related
> > > JIRA.
> > 
> > 
> > 



Re: Secure Configuration of Zookeeper

2020-01-10 Thread Andor Molnar
Hi Michael,

Very nice topic indeed. Answers inline.


> On 2020. Jan 10., at 19:20, Michael Angel  wrote:
> 
> 
> 
> What resources are available to help harden a Zookeeper installation?


Not much unfortunately. I’m thinking about a new wiki page which would be ideal 
for this. Currently you can find security related topics in the admin guide, 
but that’s probably far from complete. What we have currently is:

- Quorum TLS - wire encryption between quorum members:
https://zookeeper.apache.org/doc/r3.5.6/zookeeperAdmin.html#Quorum+TLS

- Client-Server TLS - wire encryption between client and server:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide

- Server-Server and Client-Server mutual authentication:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+and+SASL

- ACL system
https://zookeeper.apache.org/doc/r3.5.6/zookeeperProgrammers.html#sc_ZooKeeperAccessControl


> What Zookeeper files should be watched with custom auditing rules?
> Reviewing the Zookeeper documentation we don't see many security 
> configuration recommendations beyond the ACL section.


What do you mean by ‘custom auditing rules’?


> Background: we are running a 3 node Zookeeper for most projects under RHEL 
> 7.7 Systems minimal installs with SELinux, FIPS, and STIG standards.
> Zookeeper we are using to support a 3 node Kafka installation.
> We are offloading Zookeeper logs to our Central Logging system.
> We are blocking the Zookeeper mangement tcp port 2181.


That’s usually the standard secure client port. You could also disable the 
non-secure client port to close that door too.
Setting up an RBAC system or SELinux would also be nice, but we don’t provide 
rulesets for them.

Andor




Re: Question regarding latest Zookeeper version 3.5.6 removal of setQuorumPeers

2020-01-07 Thread Andor Molnar
Hi,

It has been removed in this patch: (long ago)
https://issues.apache.org/jira/browse/ZOOKEEPER-1411

It’s a 3.5.x change which I believe allows API changes. What have you used this 
method for?

Andor




> On 2020. Jan 7., at 13:37, Veerabhadra rao Mallavarapu 
>  wrote:
> 
> Hi All
> I used to use the method QuorumPeer.setQuorumPeers(), however that method was 
> removed in latest version.  Will that impact my existing code for not setting 
> the peer nodes ? Please confirm.
> If it impact what is the alternative approach to set this ?
> RegardsVeeru



Re: Zookeeper server and client authentication

2020-01-06 Thread Andor Molnar
Thanks, great stuff! I’ve already forgotten about it.

So, this is the approach of enforcing clients to authenticate during 
connection. I reckon another one which would let clients postponing 
authentication with ‘addAuth’ command:
https://issues.apache.org/jira/browse/ZOOKEEPER-2462

But that’s still open. Not a problem though, 3.6.0 is already super cool with 
this.

Andor




> On 2020. Jan 6., at 16:09, Enrico Olivelli  wrote:
> 
> Take a look to
> https://issues.apache.org/jira/browse/ZOOKEEPER-1634
> 
> Enrico
> 
> Il lun 6 gen 2020, 13:52 Andor Molnar  ha scritto:
> 
>> Are we going to release client authentication enforcement in 3.6?
>> I can’t remember a patch which implements it.
>> 
>> Andor
>> 
>> 
>> 
>> 
>>> On 2019. Dec 30., at 15:17, Enrico Olivelli  wrote:
>>> 
>>> Il lun 30 dic 2019, 14:55 shrikant kalani  ha
>>> scritto:
>>> 
>>>> Enrico,
>>>> 
>>>> Is 3.6 going to be available soon ? Within 1 month ?
>>>> 
>>> 
>>> I can't make promises.
>>> It is up to the community.
>>> I can say we are actively preparing the release.
>>> You will see, hopefully next week, a VOTE email thread on
>>> d...@zookeeper.apache.org mailing list.
>>> 
>>> If you try it and report that it is working for you, this will be a good
>>> contribution to the community
>>> 
>>> Cheers
>>> Enrico
>>> 
>>>> 
>>>> Thanks
>>>> Srikant Kalani
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On 30 Dec 2019, at 9:23 PM, Enrico Olivelli 
>> wrote:
>>>>> 
>>>>> If you try to use wrong credentials, corrupted keytab...you won't be
>>>> able
>>>>> to read/write.
>>>>> Connection maybe is allowed
>>>>> 
>>>>> Enrico
>>>>> 
>>>>> Il lun 30 dic 2019, 14:19 Arpit Jain  ha
>> scritto:
>>>>> 
>>>>>> Just to confirm the settings I have in my environment:
>>>>>> 
>>>>>> 1. On ZK side, my JAAS file looks like this:
>>>>>> Server {
>>>>>> com.sun.security.auth.module.Krb5LoginModule required
>>>>>> useKeyTab=true
>>>>>> keyTab="/conf/zoo1.keytab"
>>>>>> storeKey=true
>>>>>> useTicketCache=false
>>>>>> principal="zookeeper/z...@example.com";
>>>>>> };
>>>>>> The principal "*zookeeper/z...@example.com "* has
>>>> been
>>>>>> created in Kerberos server running locally. I am able to start ZK with
>>>> this
>>>>>> principal and I can see ticket exchange between ZK and Kerberos for
>> this
>>>>>> principal.
>>>>>> 
>>>>>> 2. On client (Curator) side, JAAS file looks like below. Principal
>>>>>> "*zkcli...@example.com
>>>>>> "* is present in Kerberos server. The curator
>> is
>>>>>> able
>>>>>> to connect properly to ZK (with or without principal) even though SASL
>>>> is
>>>>>> enabled. May be I should use ZK 3.6 as you pointed out to enforce
>>>>>> authentication.
>>>>>> Client {
>>>>>> com.sun.security.auth.module.Krb5LoginModule required
>>>>>> useKeyTab=true
>>>>>> keyTab="/tmp/zkclient.keytab"
>>>>>> storeKey=true
>>>>>> useTicketCache=false
>>>>>> principal="zkcli...@example.com";
>>>>>> };
>>>>>> 
>>>>>> Just want to make sure my settings are correct.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> On Mon, Dec 30, 2019 at 12:47 PM Enrico Olivelli <
>> eolive...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Arpit,
>>>>>>> Up to 3.5.x you can only leverage auth only in conjunction with ACLs.
>>>>>>> 
>>>>>>> I hope we are able to release 3.6.0 within a couple of weeks.
>>>>>>> 
>>>>>>> If you have time you can build from branch-3.6 and run the server
>>>>>> enabling
>>>>>>> that feature tha

Re: Issues with leader shutdown in a 3-node zookeeper cluster

2020-01-06 Thread Andor Molnar
Hi Sushil,

None of your leftover servers are responding to the client session creation 
requests (client timeouts), but the socket can be established correctly. Would 
you please share your server logs too?

Andor



> On 2019. Dec 3., at 1:14, Sushil Kumar  wrote:
> 
> I am still struggling to find the fix for this issue.
> Another problem I am facing is I don't get any other emails except for
> Damien, I am not telling that you guys do not reply, I am saying I am not
> receiving those emails, not sure what is going on, they are not even in the
> spam folder.
> 
> On Wed, Nov 27, 2019 at 8:09 AM Sushil Kumar  wrote:
> 
>> Thanks Damien for the reply.
>> 
>> That was something I had already tried.
>> I wrote single ip in my notes to show that even specific running nodes are
>> also not providing the connection.
>> 
>> Can you by any chance include in this email other people who have replied
>> earlier. I dont have their email addresses since i never received their
>> replies and archive so not show email addreses.
>> 
>> 
>> On Tue, Nov 26, 2019, 11:41 PM Damien Diederen 
>> wrote:
>> 
>>> 
>>> Sushil,
>>> 
 I have put the gist of connection string and mntr outputs, i tried
 connecting to the left-over quorum cluster without any luck.
 https://gist.github.com/sushilkm/b8a540acc487830adaa5acae3a166d51
>>> 
>>> Combining this, from your notes:
>>> 
>>>$ zkCli.sh -server "10.251.0.6:2181"
>>> 
>>> with what Andor pointed out:
>>> 
> zkCli.sh is trying to connect localhost only by default, if you run
> it without parameters.
> 
> If the node that you're trying to connect to is down (which is
> completely fine, if you still have quorum), you should provide a
> connection string (list of nodes) with at least 1 running server.
>>> 
>>> You are not running zkCli.sh without parameters, but you are only
>>> telling it about a single server; it thus doesn't have anywhere to fall
>>> back when that single node becomes unreachable.
>>> 
>>> Try something like:
>>> 
>>>$ zkCli.sh -server "10.251.0.6:2181,10.251.0.X:2181,10.251.0.Y:2181"
>>> 
>>> where 10.251.0.X and 10.251.0.Y are replaced by the addresses of the
>>> other ensemble members.
>>> 
>>> (This is not specific to the "CLI"; other clients also have to be given
>>> a "sufficient" connection string to be able to failover.  It doesn't
>>> *have* to reference the full ensemble, but providing a single member
>>> definitely won't cut it.)
>>> 
>>> HTH, -D
>>> 
>> 
> 
> -- 
> -- 
> 
> Thanks
> 
> Sushil Kumar
> +1-(206)-698-4116



Re: zookeeper / solr cloud problems

2020-01-06 Thread Andor Molnar
Hi Koji,

I reckon the best would be to raise this issue on Solr user list. 
I’m not sure if you could get any more help about it here.

Andor




> On 2019. Dec 14., at 1:09, Kojo  wrote:
> 
> Shawn,
> unfortunately, this ulimit values are for the solr user. I already checked
> for the zk user, we set the same values.
> No constrain for process creation.
> 
> This box is 128Gb, and Solr starts with 32Gb heap memory.  Only one small
> collection ~400k documents.
> 
> I see no resources constrain.
> I see no application level (Python), doing anything wrong.
> 
> I am looking for any clue to solve this problem.
> 
> Is it usefull if I start Solr and set memory dump, in case of crash?
> 
>   -
> 
>   /opt/solr-6.6.2/bin/solr -m 32g -e cloud -z localhost:2181 -a
>   "-XX:+HeapDumpOnOutOfMemoryError" -a
>   "-XX:HeapDumpPath=/opt/solr-6.6.2/example/cloud/node1/logs/archived"
> 
> 
> Thank you,
> Koji
> 
> 
> Em sex., 13 de dez. de 2019 às 18:37, Shawn Heisey 
> escreveu:
> 
>> On 12/13/2019 11:01 AM, Kojo wrote:
>>> We had already changed SO configuration before the last crash, so I think
>>> that the problem is not there.
>>> 
>>> ulimit -a
>>> core file size  (blocks, -c) 0
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 257683
>>> max locked memory   (kbytes, -l) 64
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 65535
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 8192
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 65535
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>> 
>> Are you running this ulimit command as the same user that is running
>> your Solr process?  It must be the same user to learn anything useful.
>> This output indicates that the user that's running the ulimit command is
>> allowed to start 64K processes, which I would think should be enough.
>> 
>> Best guess here is that the actual user that's running Solr does *NOT*
>> have its limits increased.  It may be a different user than you're using
>> to run the ulimit command.
>> 
>>> When Solr tries to delete a znode? I´am sorry, because I understand
>> nothing
>>> about this process, and it is the only point that seems suspicios for me.
>>> Do you think that it can cause inconsistency leading to the OOM problem?
>> 
>> OOME isn't caused by inconsistencies at the application level.  It's a
>> low-level problem, an indication that Java tried to do something
>> required to run the program that it couldn't do.
>> 
>> I assume that it's Solr trying to delete the znode, because the node
>> path has solr in it.  It will be the ZK client running inside Solr
>> that's actually trying to do the work, but Solr code probably initiated it.
>> 
>>> Just after this INFO message above, ZK log starts to log thousands of
>> this
>>> block of lines below. Where it seems that ZK creates and closes thousands
>>> of sessions.
>> 
>> I responded to this thread because I have some knowledge about Solr.  I
>> really have no idea what these additional ZK server logs might mean.
>> The one that you quoted before was pretty straightforward, so I was able
>> to understand it.
>> 
>> Anything that gets logged after an OOME is suspect and may be useless.
>> The execution of a Java program after OOME is unpredictable, because
>> whatever was being run when the OOME was thrown did NOT successfully
>> execute.
>> 
>> Thanks,
>> Shawn
>> 



Re: Zookeeper spec file for building libzookeeper and libzookeeper-devel rpm

2020-01-06 Thread Andor Molnar
Hi,

Packaging is part of Apache BigTop project. You might be looking for this:
https://github.com/apache/bigtop/blob/master/bigtop-packages/src/rpm/zookeeper/SPECS/zookeeper.spec

Andor



> On 2019. Dec 20., at 6:43, Pradeep Choudhary  
> wrote:
> 
> Hi,
> 
> Does anybody have zookeeper spec file for building libzookeeper and 
> libzookeeper-devel rpm for version >= 3.5.5 ? I want to leverage that in my 
> project.
> 
> Thanks,
> Pradeep



Re: User Interface for Zookeeper/Kafka administration?

2020-01-06 Thread Andor Molnar
Hi,

There’s no such User Interface built-in for ZooKeeper and I’m not sure about 
Kafka. Hadoop companies like Cloudera and MapR creates proprietary software 
that are able to “visualize” clusters in one way or the other.

I’m not aware of such open source projects.

Andor
 


> On 2019. Dec 23., at 10:31, Sunil CHAUDHARI  
> wrote:
> 
> Hi,
> I have setup 3 nodes zookeepr and 3 Brokers Kafka cluster on Linux servers.
> Is there any user interface available where I can see complete picture of my 
> both kafka and zookeeper nodes?
> Or I have only command line interface ☹☹, which is quite time consuming and 
> present very selective data based on commands.
> 
> 
> Thanks
> Sunil.
> 
> CONFIDENTIAL NOTE:
> The information contained in this email is intended only for the use of the 
> individual or entity named above and may contain information that is 
> privileged, confidential and exempt from disclosure under applicable law. If 
> the reader of this message is not the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this message in 
> error, please immediately notify the sender and delete the mail. Thank you.



Re: Zookeeper server and client authentication

2020-01-06 Thread Andor Molnar
Are we going to release client authentication enforcement in 3.6?
I can’t remember a patch which implements it.

Andor




> On 2019. Dec 30., at 15:17, Enrico Olivelli  wrote:
> 
> Il lun 30 dic 2019, 14:55 shrikant kalani  ha
> scritto:
> 
>> Enrico,
>> 
>> Is 3.6 going to be available soon ? Within 1 month ?
>> 
> 
> I can't make promises.
> It is up to the community.
> I can say we are actively preparing the release.
> You will see, hopefully next week, a VOTE email thread on
> d...@zookeeper.apache.org mailing list.
> 
> If you try it and report that it is working for you, this will be a good
> contribution to the community
> 
> Cheers
> Enrico
> 
>> 
>> Thanks
>> Srikant Kalani
>> 
>> Sent from my iPhone
>> 
>>> On 30 Dec 2019, at 9:23 PM, Enrico Olivelli  wrote:
>>> 
>>> If you try to use wrong credentials, corrupted keytab...you won't be
>> able
>>> to read/write.
>>> Connection maybe is allowed
>>> 
>>> Enrico
>>> 
>>> Il lun 30 dic 2019, 14:19 Arpit Jain  ha scritto:
>>> 
 Just to confirm the settings I have in my environment:
 
 1. On ZK side, my JAAS file looks like this:
 Server {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/conf/zoo1.keytab"
  storeKey=true
  useTicketCache=false
  principal="zookeeper/z...@example.com";
 };
 The principal "*zookeeper/z...@example.com "* has
>> been
 created in Kerberos server running locally. I am able to start ZK with
>> this
 principal and I can see ticket exchange between ZK and Kerberos for this
 principal.
 
 2. On client (Curator) side, JAAS file looks like below. Principal
 "*zkcli...@example.com
 "* is present in Kerberos server. The curator is
 able
 to connect properly to ZK (with or without principal) even though SASL
>> is
 enabled. May be I should use ZK 3.6 as you pointed out to enforce
 authentication.
 Client {
  com.sun.security.auth.module.Krb5LoginModule required
  useKeyTab=true
  keyTab="/tmp/zkclient.keytab"
  storeKey=true
  useTicketCache=false
  principal="zkcli...@example.com";
 };
 
 Just want to make sure my settings are correct.
 
 Thanks
 
> On Mon, Dec 30, 2019 at 12:47 PM Enrico Olivelli 
> wrote:
> 
> Arpit,
> Up to 3.5.x you can only leverage auth only in conjunction with ACLs.
> 
> I hope we are able to release 3.6.0 within a couple of weeks.
> 
> If you have time you can build from branch-3.6 and run the server
 enabling
> that feature tha you are pointing to.
> It is a server side change only so you can use 3.5 in your application
> 
> 
> Enrico
> 
> Il lun 30 dic 2019, 13:23 shrikant kalani 
>> ha
> scritto:
> 
>> Couple of things which you can check -
>> 1) if your Zookeeper server is not running with Zookeeper I’d then you
>> need to set Zookeeper.sasl.client.username
>> 2) set java.security.auth.login.config
>> 
>> And I also faced the same issue that there is no strict enforcement to
>> allow only authenticated client. Unless someone is aware of the way I
> doubt
>> we may need to wait for 3.6
>> 
>> Thanks
>> Srikant
>> 
>> Sent from my iPhone
>> 
>>> On 30 Dec 2019, at 8:11 PM, Arpit Jain 
 wrote:
>>> 
>>> Hi,
>>> 
>>> I have configured Zookeeper 3.5.5 to use SASL authentication using
>>> Kerberos. I am able to authenticate ZK with Kerberos server but I
 don't
>> see
>>> any authentication happening between Zookeeper client (curator) and
 ZK
>>> server. I have put the following setting in zoo.cfg and followed this
>> guide
>>> 
>> 
> 
 
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/Client-Server+mutual+authentication
>>> .
>>> 
>>> 
>> 
> 
 
>> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
>>> requireClientAuthScheme=sasl
>>> 
>>> What additional setting I need to provide so that only authenticated
>>> clients (for which principals are present in Kerberos server) can
> connect
>>> to ZK server ?
>>> I also found this link
>>> https://github.com/apache/zookeeper/pull/118/commits which
>>> mentions that it will be strict only from ZK 3.6 onwards and
 currently
> ZK
>>> does not enforce it even if we have the configuration.
>>> 
>>> Thanks
>> 
> 
 
>> 



Re: Zookeeper 3.5 SSL and Kerberos authentication

2019-12-17 Thread Andor Molnar
"We were using early 3.5.3 or something like that.”

Netty stack had a major refactor in 3.5.5

Andor



> On 2019. Dec 17., at 16:40, Enrico Olivelli  wrote:
> 
> Il giorno mar 17 dic 2019 alle ore 16:26 Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> ha scritto:
> 
>> I added a comment on Jira. This is something we will also need to fix in my
>> company soon.
>> 
>> @enrico you wrote:
>>> in my company we set up some ZK with TLS and SASL, using TLS for
>> encryption and SASL for auth. We were using early 3.5.3 or something like
>> that.
>> 
> 
> Unfortunately we do not have that setup anymore, we had to drop it because
> at that time (and still nowadays) from the same JVMs we had also to connect
> to an HBase cluster with ZK 3.4
> that does not support TLS.
> 
> Currently we are using only SASL and not TLS
> Sorry
> 
> Enrico
> 
> 
>> 
>> According to this, the scenario should work. Maybe we just misconfigured
>> something, or this was something got broken in a later version? Can you
>> share the config you use? Maybe you are setting `zookeeper.ssl.clientAuth`
>> and `zookeeper.ssl.quorum.clientAuth` to `none` or `want` ?
>> 
>> Regards,
>> Mate
>> 
>> On Tue, Dec 17, 2019 at 10:48 AM Andor Molnar  wrote:
>> 
>>> Hi Jorn,
>>> 
>>> Sorry for coming back late to this. I’ve just validated the scenario on
>> my
>>> test cluster. Looks like the issue is valid: Kerberos auth and SSL are
>>> mutually exclusive currently. When Kerberos is set up and trying to
>> connect
>>> to secure port I got an infinite loop on client side:
>>> 
>>> 2019-12-17 01:43:30,984 [myid:barbaresco-1.vpc.cloudera.com:2182] - WARN
>>> [Thread-39:Login$1@197] - TGT renewal thread has been interrupted and
>>> will exit.
>>> 2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [main-SendThread(barbaresco-1.vpc.cloudera.com:2182):Login@302] - Client
>>> successfully logged in.
>>> 2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [Thread-40:Login$1@135] - TGT refresh thread started.
>>> 2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [main-SendThread(barbaresco-1.vpc.cloudera.com:2182):SecurityUtils$1@124
>> ]
>>> - Client will use GSSAPI as SASL mechanism.
>>> 2019-12-17 01:43:30,988 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [main-SendThread(barbaresco-1.vpc.cloudera.com:2182
>>> ):ClientCnxn$SendThread@1112] - Opening socket connection to server
>>> barbaresco-1.vpc.cloudera.com/10.65.25.98:2182. Will attempt to
>>> SASL-authenticate using Login Context section 'Client'
>>> 2019-12-17 01:43:30,988 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [main-SendThread(barbaresco-1.vpc.cloudera.com:2182
>>> ):ClientCnxn$SendThread@959] - Socket connection established, initiating
>>> session, client: /10.65.25.98:45362, server:
>>> barbaresco-1.vpc.cloudera.com/10.65.25.98:2182
>>> 2019-12-17
>>> <http://barbaresco-1.vpc.cloudera.com/10.65.25.98:21822019-12-17>
>>> 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [Thread-40:Login@320] - TGT valid starting at:Tue Dec 17
>> 01:43:30
>>> PST 2019
>>> 2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [Thread-40:Login@321] - TGT expires:  Thu Jan 16
>> 01:43:30
>>> PST 2020
>>> 2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [Thread-40:Login$1@193] - TGT refresh sleeping until: Fri Jan 10
>> 20:23:33
>>> PST 2020
>>> 2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO
>>> [main-SendThread(barbaresco-1.vpc.cloudera.com:2182
>>> ):ClientCnxn$SendThread@1240] - Unable to read additional data from
>>> server sessionid 0x0, likely server has closed socket, closing socket
>>> connection and attempting reconnect
>>> 
>>> And the following error on server side:
>>> 
>>> 2019-12-17 01:43:33,002 INFO
>>> org.apache.zookeeper.server.NettyServerCnxnFactory: SSL handler added for
>>> channel: [id: 0xcf37c14b, L:/10.65.25.98:2182 - R:/10.65.25.98:45380]
>>> 2019-12-17 01:43:33,003 ERROR
>>> org.apache.zookeeper.server.NettyServerCnxnFactory: Unsuccessful
>> handshake
>>> with session 0x0
>>> 2019-12-17 01:43:33,003 WARN
>>> org.apache.zookeeper.server.NettyServer

Re: Zookeeper 3.5 SSL and Kerberos authentication

2019-12-17 Thread Andor Molnar
Hi Jorn,

Sorry for coming back late to this. I’ve just validated the scenario on my test 
cluster. Looks like the issue is valid: Kerberos auth and SSL are mutually 
exclusive currently. When Kerberos is set up and trying to connect to secure 
port I got an infinite loop on client side:

2019-12-17 01:43:30,984 [myid:barbaresco-1.vpc.cloudera.com:2182] - WARN  
[Thread-39:Login$1@197] - TGT renewal thread has been interrupted and will exit.
2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[main-SendThread(barbaresco-1.vpc.cloudera.com:2182):Login@302] - Client 
successfully logged in.
2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[Thread-40:Login$1@135] - TGT refresh thread started.
2019-12-17 01:43:30,987 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[main-SendThread(barbaresco-1.vpc.cloudera.com:2182):SecurityUtils$1@124] - 
Client will use GSSAPI as SASL mechanism.
2019-12-17 01:43:30,988 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[main-SendThread(barbaresco-1.vpc.cloudera.com:2182):ClientCnxn$SendThread@1112]
 - Opening socket connection to server 
barbaresco-1.vpc.cloudera.com/10.65.25.98:2182. Will attempt to 
SASL-authenticate using Login Context section 'Client'
2019-12-17 01:43:30,988 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[main-SendThread(barbaresco-1.vpc.cloudera.com:2182):ClientCnxn$SendThread@959] 
- Socket connection established, initiating session, client: 
/10.65.25.98:45362, server: barbaresco-1.vpc.cloudera.com/10.65.25.98:2182
2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[Thread-40:Login@320] - TGT valid starting at:Tue Dec 17 01:43:30 PST 
2019
2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[Thread-40:Login@321] - TGT expires:  Thu Jan 16 01:43:30 PST 
2020
2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[Thread-40:Login$1@193] - TGT refresh sleeping until: Fri Jan 10 20:23:33 PST 
2020
2019-12-17 01:43:30,989 [myid:barbaresco-1.vpc.cloudera.com:2182] - INFO  
[main-SendThread(barbaresco-1.vpc.cloudera.com:2182):ClientCnxn$SendThread@1240]
 - Unable to read additional data from server sessionid 0x0, likely server has 
closed socket, closing socket connection and attempting reconnect

And the following error on server side:

2019-12-17 01:43:33,002 INFO 
org.apache.zookeeper.server.NettyServerCnxnFactory: SSL handler added for 
channel: [id: 0xcf37c14b, L:/10.65.25.98:2182 - R:/10.65.25.98:45380]
2019-12-17 01:43:33,003 ERROR 
org.apache.zookeeper.server.NettyServerCnxnFactory: Unsuccessful handshake with 
session 0x0
2019-12-17 01:43:33,003 WARN 
org.apache.zookeeper.server.NettyServerCnxnFactory: Exception caught
io.netty.handler.codec.DecoderException: 
io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 
002d7530001000
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:475)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at 
io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:483)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:383)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)

I will update the Jira too.

Andor





> On 2019. Nov 8., at 20:31, Jörn Franke  wrote:
> 
> Thanks. Can you please share the configuration file?
> 
> I tried with 3.5.5 - without SSL Kerberos works, but once I configured client 
> ssl it said authentication fail (I have to check if I can dig up the log 
> files) and as far as I remember this was related to x509 authentication. The 
> 

Re: Any interest in a gRPC version of ZooKeeper

2019-11-28 Thread Andor Molnar
Unfortunately non-committers are having some difficulties joining that channel.

For example:
"@Mate Szalay-Beko is a Multi-Channel Guest. Only your Workspace Admin can 
invite them to a public channel.”

Does anyone have an idea what does that mean?

Andor



> On 2019. Nov 27., at 19:18, Jordan Zimmerman  
> wrote:
> 
> FYI
> 
> We have an open discussion regarding replacing Jute, using gRPC and related 
> things in this sub channel on the ASF Slack board. All are welcome to join in:
> 
> https://the-asf.slack.com/archives/CQKS7A3FT 
> 
> 
> -Jordan
> 
>> On Nov 18, 2019, at 9:25 AM, Jordan Zimmerman  
>> wrote:
>> 
>> Hi Folks,
>> 
>> I've written a proof of concept implementation of a ServerCnxnFactory that 
>> implements gRPC. The goal is to make it possible to easily write ZooKeeper 
>> clients in non-JVM languages. Using the proof of concept I was able to write 
>> a Golang client easily. What's the interest level of something like this? 
>> Let's discuss if it's worth pursuing. I'd be willing to move this from proof 
>> of concept to production but I'll need help (1 or 2 co-developers).
>> 
>> If you want to try it, I've pushed the Golang client and some instructions 
>> here (let me know if you have any issues - I'm a go neophyte). Note: 
>> "zookeeper/test.go" is the interesting file:
>> 
>>  https://github.com/Randgalt/zkgrpc 
>> 
>> Here's the proof of concept on the ZK server side (the interesting files are 
>> RpcServerCnxn.java, RpcServerCnxnFactory.java, RpcZooKeeperServer.java and 
>> zookeeper.proto):
>> 
>>  https://github.com/apache/zookeeper/compare/master...Randgalt:wip-grpc 
>>  
>> 
>> Issues:
>> Writing a client, even with gRPC, will require some work. Sessions have to 
>> be maintained, watchers have to be maintained, etc.
>> Currently, Jute is deeply embedded in ZooKeeper. The proof of concept has to 
>> emulate Jute byte buffers. Ideally, this will be abstracted so that only 
>> records could be used so that the gRPC connection doesn't have to keep 
>> marshalling/unmarshalling byte buffers
>> I don't know enough about the gRPC client/server implementations to know if 
>> it will meet the needs of ZooKeeper. Anyone have experience here?
>> I haven't completely thought through how much work it will take to write 
>> useful clients. As I've shown with the proof of concept simple ZK CRUD db 
>> operations work well. I need to spend time writing a recipe such as Leader 
>> Election to see how much work is required.
>> I'm not sure how things like SASL and reconfig would work with gRPC
>> 
>> -Jordan
> 



Re: Does ZK 3.4.14 support Netty 4.1.42.Final?

2019-11-26 Thread Andor Molnar
Oh, great, I’m still having my incomplete patch locally for that Jira.
Abandoned a while ago, but I think I can come back to this possibly tomorrow.

Thanks for the heads up! :)

Andor



> On 2019. Nov 25., at 19:39, Daniel Chan  wrote:
> 
> Thanks Patrick and Tamas for the information.
> 
> Is there any ETA on https://issues.apache.org/jira/browse/ZOOKEEPER-3568?
> 
> We are currently running on 3.4.9 server and 3.4.6 client. If moving to 
> 3.5.6, should we upgrade the server or client first?
> 
> Thanks,
> Daniel
> 
> -Original Message-
> From: Patrick Hunt  
> Sent: Monday, November 25, 2019 9:55 AM
> To: UserZooKeeper 
> Subject: Re: Does ZK 3.4.14 support Netty 4.1.42.Final?
> 
> This was discussed relatively recently:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.apache.org_thread.html_680038b345da49a3d5cb452de5d54d62f14d1df0747690980c218c1a-40-253Cdev.zookeeper.apache.org-253E=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=BbAVeHS1OYH8LyYFALpMB3Y_LWoECeuvBs41uJRNkAQ=pRvPNkgqtf35FPguSMVExKsUyE1EYZcI3trC9TpwszQ=
>  
> 
> Gist is that while the identified issue didn't affect us directly folks 
> should move to 3.5 (or don't use netty in 3.4) given 3.4 is using a version 
> of netty that's no longer supported and too difficult to upgrade.
> 
> Patrick
> 
> 
> On Sat, Nov 23, 2019 at 12:36 AM Tamas Penzes 
> wrote:
> 
>> Hi Daniel,
>> 
>> I remember that the migration from Netty 3 to 4 wasn't a trivial task, 
>> so I would not expect it in any future ZK 3.4 release.
>> 
>> But we have ZK 3.5.5 and 3.5.6 and the migration to any of them is not 
>> really problematic since they are backward compatible. We have done it 
>> for many Hadoop component, without big code changes (if you use 
>> Curator, don't forget to use 4.2.0+ and exclude it's own beta ZK).
>> 
>> So the best is to try ZK 3.5.6.
>> 
>> Regards, Tamaas
>> 
>> On Sat, Nov 23, 2019, 00:52 Daniel Chan  wrote:
>> 
>>> Hi,
>>> 
>>> From
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mvnrepository.c
>>> om_artifact_org.apache.zookeeper_zookeeper_3.4.14=DwIBaQ=RoP1Yum
>>> CXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18h
>>> zzEnqunUhCoEns=BbAVeHS1OYH8LyYFALpMB3Y_LWoECeuvBs41uJRNkAQ=PL7JU
>>> eCo6BJ1AJDl7Egx5u7-xSEf3SnaECIWRnvMoGc=
>> ,
>>> Zookeeper depends on Netty 3.10.6.Final.
>>> 
>>> However, Netty has CVEs for versions prior to 4.1.42.Final as per 
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__nvd.nist.gov_vuln_detail_CVE-2D2019-2D16869=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=BbAVeHS1OYH8LyYFALpMB3Y_LWoECeuvBs41uJRNkAQ=K0DkivRX3n0O2CrM65WwY-BsIsqbeTQRjwL6hVTfjFg=
>>>  :
>>> Netty before 4.1.42.Final mishandles whitespace before the colon in 
>>> HTTP headers (such as a "Transfer-Encoding : chunked" line), which 
>>> leads to
>> HTTP
>>> request smuggling.
>>> 
>>> Will Zookeeper (both client and server) work if we use Netty 
>>> 4.1.42.Final or above instead?
>>> 
>>> Also what jars are needed for the Zookeeper Client?
>>> 
>>> Thanks,
>>> Daniel
>>> 
>> 



Re: Native client on MacOS and Windows?

2019-11-22 Thread Andor Molnar
Hi deepak,

I’m trying to refresh my memory about this. I think I’ve given up building on 
MacOS, because when we call the linker at the end of the compilation, we call 
it with a parameter which is not supported in OSX’s standard toolchain. (Xcode)

Which means I’ve managed to get around with the issue you mentioned, but I 
can’t recall the details.

Btw I usually use CentOS docker/VM to compile and validate the C client. Why do 
you need native Mac support?

Andor





> On 2019. Nov 6., at 5:11, deepak  wrote:
> 
> Hi All,
> 
> I see that from the support matrix
> that
> the native client is not supported on Windows and MacOS.  I am curious to
> know if anyone has had success in getting it to work on these platforms?
> On MacOS, naturally, we are mostly interested in development, and on
> Windows, we would like to use it in production.
> 
> At this point, I am hitting compile problems on MacOS (see below for
> details).
> 
> On Windows, I was able to run "cmake --build ." successfully, but running
> "ctest -V" gives me "No tests were found!!!".  It seems there are no unit
> tests configured to run on Windows?  Or am I missing something?
> 
> PS:
> Compile problems on MacOS:
> zookeeper-client-c$ make check
> /Library/Developer/CommandLineTools/usr/bin/make  zktest-st zktest-mt
> g++ -DHAVE_CONFIG_H -I.  -I./include -I./tests -I./generated
> -DUSE_STATIC_LIB -I/opt/local/include
> -DZKSERVER_CMD="\"./tests/zkServer.sh\"" -DZOO_IPV6_ENABLED  -g -O2 -MT
> zktest_st-LibCSymTable.o -MD -MP -MF .deps/zktest_st-LibCSymTable.Tpo -c -o
> zktest_st-LibCSymTable.o `test -f 'tests/LibCSymTable.cc' || echo
> './'`tests/LibCSymTable.cc
> In file included from tests/LibCSymTable.cc:19:
> ./tests/LibCSymTable.h:85:36: error: unknown type name 'clockid_t'; did you
> mean 'clock_t'?
>DECLARE_SYM(int,clock_gettime,(clockid_t clk_id, struct timespec*));
>   ^
>   clock_t
> ./tests/LibCSymTable.h:51:29: note: expanded from macro 'DECLARE_SYM'
>typedef ret (*sym##_sig)sig; \
>^
> /usr/include/sys/_types/_clock_t.h:31:33: note: 'clock_t' declared here
> typedef __darwin_clock_tclock_t;
>^
> 1 error generated.
> make[1]: *** [zktest_st-LibCSymTable.o] Error 1
> make: *** [check-am] Error 2
> 
> Trying to work around this by using "clock_t" instead of "clockid_t", I get
> past this to hit this next error:
> 
> [...snip...]
> In file included from tests/TestZookeeperInit.cc:19:
> In file included from
> /opt/local/include/cppunit/extensions/HelperMacros.h:9:
> /opt/local/include/cppunit/TestCaller.h:121:28: error: no member named
> 'bind' in namespace 'std';
>  did you mean 'find'?
>m_test_function( std::bind(test, m_fixture) )
> ~^~~~
>  find



Re: Zookeeper ACL Digest Scheme permission override

2019-11-20 Thread Andor Molnar
I'm not sure if I understand the problem well.

Please provide the exact steps in order which you made to reproduce the
problem. Also output of getAcl of the affected node(s) would be useful.

Andor


-Original Message-
From: Chelambarasan Rajendran <
chelambarasan.rajend...@walmartlabs.com.INVALID>
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org 
Subject: Zookeeper ACL Digest Scheme permission override
Date: Wed, 20 Nov 2019 05:20:46 +

Hi Team,

When tried setting up the Zookeeper ACL with Digest scheme, The
permission of the second user is overriding the first user as well.

We tried create user and set permission for admin and read-only user
separately and also created together in a single step as well. In both
the cases, the permission of second user is overriding the first user.

The zookeeper versions tried are 3.4.9 and 3.5.5. Created a ticket with
Zookeeper with all the steps and the details are as below

http://zookeeper-user.578899.n2.nabble.com/zk-digest-ACL-permissions-gets-overridden-td7584490.html

https://issues.apache.org/jira/browse/ZOOKEEPER-3617

Could you please check and confirm if this a bug or we are missing
anything?


Thanks,
Chelambarasan




Re: Issues with leader shutdown in a 3-node zookeeper cluster

2019-11-20 Thread Andor Molnar
Hi Sushil,

zkCli.sh is trying to connect localhost only by default, if you run it
without parameters.

If the node that you're trying to connect to is down (which is
completely fine, if you still have quorum), you should provide a
connection string (list of nodes) with at least 1 running server.

Andor



-Original Message-
From: Sushil Kumar 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org
Subject: Issues with leader shutdown in a 3-node zookeeper cluster
Date: Tue, 19 Nov 2019 17:09:08 -0800

Hello


I am trying to run a 3-node zookeeper cluster.
It starts up good and I am able to access it.
However, as soon as I shutdown the leader, some other node out of
left-overs becomes a primary node which I believe is working as
expected.

However, if I try to connect using the zkCli.sh in this state, it
cannot
connect, it always remains in connecting state, and there is no way now
that I can access my zookeeper cluster.

The only way I have been able to fix is stop all nodes and start then
in
sequence.

Couple of questions.
First of all that zkCli.sh behavior with the cluster does not looks
something a happy path to me. I doubt if my cluster is behaving good.
Now
if this cluster is not working why does my cluster status appear
working
"LEADER/FOLLOWER" for each left over node.

I tried this with 5-node cluster and noticed exactly the same behavior.
So I wonder how do people generally manage a working zookeeper cluster
with
leader going down.

Thanks
Sushil Kumar



Re: Failing C API tests on Linux

2019-11-08 Thread Andor Molnar
Thanks Deepak!

I'm not familiar with our Makefile, but to my best knowledge
build/tests/etc. should all run with Maven to work properly.

Andor



-Original Message-
From: deepak 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org
Subject: Re: Failing C API tests on Linux
Date: Tue, 5 Nov 2019 18:42:30 -0600

Hi Andor,

Thank you for the suggestion.  It turns out I had stale zkdata
directory
under /tmp that was causing permission denied errors from within tests.
Once I removed those, the tests passed.
As a side note, it seems the "make check" target expects base_dir to be
set.  So I had to set it manually from the command line (just like how
it's
set in the maven project's "test" phase).

--
Deepak

On Sat, Nov 2, 2019 at 3:13 AM Andor Molnar  wrote:

> Hi deepak,
> 
> It’s a single test failure originally, not a build error.
> Removing TestClient.cc will cause all kinds of problems, because I
> believe
> most tests are dependent on it.
> 
> You can try to run the tests multiple times and if it still fails may
> try
> debugging it.
> 
> What system are you running the build on?
> 
> Andor
> 
> 
> 
> 
> > On 2019. Oct 31., at 22:03, deepak  wrote:
> > 
> > It turns out I had not built the ZooKeeper core before building the
> > C API
> > (it had a class not found exception in logs).  After building the
> ZooKeeper
> > core, trying to build the C API still gives me failures (see
> > below).
> Could
> > someone help me with this?  How should I go about figuring out what
> > is
> > wrong?
> > 
> > Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called
> > after
> > throwing an instance of 'CppUnit::Exception'
> >  what():  equality assertion failed
> > - Expected: -101
> > - Actual  : -4
> > 
> > /bin/sh: line 5:  7241 Aborted ZKROOT=./../..
> > CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover*.jar ${dir}$tst
> > FAIL: zktest-mt
> > ==
> > 1 of 2 tests failed
> > Please report to user@zookeeper.apache.org
> > ==
> > 
> > If I uncomment just the testAsyncWatcherAutoReset in
> > tests/TestClient.cc,
> > then I get a whole bunch of other failures:
> > 
> > Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
> > Zookeeper_simpleSystem::testFirstServerDown : assertion : elapsed
> > 11002
> > Zookeeper_simpleSystem::testNullData : assertion : elapsed 1001
> > Zookeeper_simpleSystem::testIPV6 : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testCreate : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testCreateContainer : assertion : elapsed
> > 1000
> > Zookeeper_simpleSystem::testCreateTtl : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testPath : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testPathValidation : assertion : elapsed
> > 1000
> > Zookeeper_simpleSystem::testPing : assertion : elapsed 2001
> > Zookeeper_simpleSystem::testAcl : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testChroot : assertion : elapsed 2001
> > Zookeeper_simpleSystem::testAuth : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testHangingClient : elapsed 1001 : OK
> > Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal : assertion
> > :
> > elapsed 1000
> > Zookeeper_simpleSystem::testWatcherAutoResetWithLocal : assertion :
> elapsed
> > 1000
> > Zookeeper_simpleSystem::testGetChildren2 : assertion : elapsed 1000
> > Zookeeper_simpleSystem::testLastZxid : assertion : elapsed 2001
> > Zookeeper_simpleSystem::testRemoveWatchers : assertion : elapsed
> > 1000
> > Zookeeper_readOnly::testReadOnly./tests/zkServer.sh: line 55: kill:
> (19172)
> > - No such process
> > this target is for unit tests only
> > : assertion : elapsed 11
> > tests/TestClientRetry.cc:137: Assertion: equality assertion failed
> > [Expected: 1, Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
> > [Expected:
> 1,
> > Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
> > [Expected:
> 1,
> > Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
> > [Expected:
> 1,
> > Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
> > [Expected:
> 1,
> > Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
> > [Expected:
> 1,
> > Actual  : 0]
> > tests/TestMulti.cc:213: Assertion: equality assertion failed
>

Re: Zookeeper 3.5 SSL and Kerberos authentication

2019-11-08 Thread Andor Molnar
Hi Jörn!

Thanks for reporting the problem, I answered the Jira.
I'll set up a small test environment soon, but we need more specifics
on how you set up your environment and what is the failure exactly.

Regards,
Andor



-Original Message-
From: Jörn Franke 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org
Subject: Zookeeper 3.5 SSL and Kerberos authentication
Date: Wed, 6 Nov 2019 22:28:32 +0100

Dear all,

it seems that ZooKeeper 3.5 with SSL enabled does not support Kerberos
authentication, but only X509 authentication. Kerberos is used in many
Enterprise environments and is supported by Apache Solr. Is this a bug?
Or
am I missing something?


I created a Jira for this:
https://issues.apache.org/jira/browse/ZOOKEEPER-3482


thank you.

best regards



Re: Failing C API tests on Linux

2019-11-02 Thread Andor Molnar
Hi deepak,

It’s a single test failure originally, not a build error.
Removing TestClient.cc will cause all kinds of problems, because I believe most 
tests are dependent on it.

You can try to run the tests multiple times and if it still fails may try 
debugging it.

What system are you running the build on?

Andor




> On 2019. Oct 31., at 22:03, deepak  wrote:
> 
> It turns out I had not built the ZooKeeper core before building the C API
> (it had a class not found exception in logs).  After building the ZooKeeper
> core, trying to build the C API still gives me failures (see below).  Could
> someone help me with this?  How should I go about figuring out what is
> wrong?
> 
> Zookeeper_simpleSystem::testAsyncWatcherAutoResetterminate called after
> throwing an instance of 'CppUnit::Exception'
>  what():  equality assertion failed
> - Expected: -101
> - Actual  : -4
> 
> /bin/sh: line 5:  7241 Aborted ZKROOT=./../..
> CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover*.jar ${dir}$tst
> FAIL: zktest-mt
> ==
> 1 of 2 tests failed
> Please report to user@zookeeper.apache.org
> ==
> 
> If I uncomment just the testAsyncWatcherAutoReset in tests/TestClient.cc,
> then I get a whole bunch of other failures:
> 
> Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
> Zookeeper_simpleSystem::testFirstServerDown : assertion : elapsed 11002
> Zookeeper_simpleSystem::testNullData : assertion : elapsed 1001
> Zookeeper_simpleSystem::testIPV6 : assertion : elapsed 1000
> Zookeeper_simpleSystem::testCreate : assertion : elapsed 1000
> Zookeeper_simpleSystem::testCreateContainer : assertion : elapsed 1000
> Zookeeper_simpleSystem::testCreateTtl : assertion : elapsed 1000
> Zookeeper_simpleSystem::testPath : assertion : elapsed 1000
> Zookeeper_simpleSystem::testPathValidation : assertion : elapsed 1000
> Zookeeper_simpleSystem::testPing : assertion : elapsed 2001
> Zookeeper_simpleSystem::testAcl : assertion : elapsed 1000
> Zookeeper_simpleSystem::testChroot : assertion : elapsed 2001
> Zookeeper_simpleSystem::testAuth : assertion : elapsed 1000
> Zookeeper_simpleSystem::testHangingClient : elapsed 1001 : OK
> Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal : assertion :
> elapsed 1000
> Zookeeper_simpleSystem::testWatcherAutoResetWithLocal : assertion : elapsed
> 1000
> Zookeeper_simpleSystem::testGetChildren2 : assertion : elapsed 1000
> Zookeeper_simpleSystem::testLastZxid : assertion : elapsed 2001
> Zookeeper_simpleSystem::testRemoveWatchers : assertion : elapsed 1000
> Zookeeper_readOnly::testReadOnly./tests/zkServer.sh: line 55: kill: (19172)
> - No such process
> this target is for unit tests only
> : assertion : elapsed 11
> tests/TestClientRetry.cc:137: Assertion: equality assertion failed
> [Expected: 1, Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestMulti.cc:213: Assertion: equality assertion failed [Expected: 1,
> Actual  : 0]
> tests/TestClient.cc:327: Assertion: assertion failed [Expression:
> ctx.waitForConnected(zk)]
> tests/TestClient.cc:783: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:773: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:686: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:712: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:725: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:807: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:489: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:375: Assertion: equality assertion failed [Expected: 0,
> Actual  : -4]
> tests/TestClient.cc:548: Assertion: equality assertion failed [Expected: 0,
> 

Re: Is there any tool to verify zookeeper snapshot file?

2019-10-31 Thread Andor Molnar
Try SnapshotFormatter:

https://stackoverflow.com/questions/17894808/how-do-one-read-the-zookeeper-transaction-log

We might need to have a SnapshotToolkit tool to verify snapshot files similar 
to TxnLogToolkit.

Andor



> On 2019. Oct 30., at 23:48, rammohan ganapavarapu  
> wrote:
> 
> There are cases where zk will fail to start with invalid snapshot if
> snapshot  file is corrupt or incomplete, i wanted to verify if the snapshot
> is not corrupt or complete before restore.
> 
> Ram
> 
> On Wed, Oct 30, 2019 at 2:25 PM Enrico Olivelli  wrote:
> 
>> Ram
>> 
>> Il mer 30 ott 2019, 21:23 rammohan ganapavarapu 
>> ha scritto:
>> 
>>> Hi,
>>> 
>>> I am trying to see is there any tool available to verify the zk snapshot
>>> file, any such thing exist?
>>> 
>> 
>> What do you mean with 'verify'? To inspect the contents?
>> 
>> Enrico
>> 
>>> 
>>> Thanks,
>>> Ram
>>> 
>> 



Re: Kerberos login error: Message stream modified (41)

2019-10-29 Thread Andor Molnar
Hi Alessandro,

Thanks for the help. It looks like the issue is on our side: KDC hasn’t been 
properly setup for Zookeeper: required principals don’t exist. 

I just wonder why the error message cannot be more descriptive and if we could 
improve it by properly logging the original exception.

Andor




> On 2019. Oct 29., at 14:35, Alessandro Luccaroni - Diennea 
>  wrote:
> 
> Hi Andor,
> Enrico's collegue here.
> 
> If I remember correctly the issue in our case was related to the 
> ticket_lifetime and renew_lifetime options.
> These two krb.conf options didn't matter before Java 9 (see 
> https://bugs.openjdk.java.net/browse/JDK-8044500 and 
> https://bugs.openjdk.java.net/browse/JDK-8131051) and, as soon as we updated 
> the JDK version, we started to see weird issue related to the ticket 
> expiration. We simply decided to remove the option from the krb.conf and use 
> the Kerberos default.
> 
> With JDK8/Unlimited Strength the problem was related with the enctype: I see 
> that you fixed it on the krb.conf by adding the option to the client, we 
> instead changed the option at the krb level so to ensure that the keytab 
> generated were compatible (supported_enctypes option). I guess this is less 
> of a problem with modern JDKs.
> 
> Regards,
> Alessandro Luccaroni
> Platform Manager @ Diennea - MagNews
> Tel.: (+39) 0546 066100 Int. 924
> Viale G.Marconi 30/14 - 48018 Faenza (RA) - Italy
> 
>> -Messaggio originale-
>> Da: Enrico Olivelli 
>> Inviato: martedì 29 ottobre 2019 14:23
>> A: UserZooKeeper 
>> Oggetto: Re: Kerberos login error: Message stream modified (41)
>> 
>> Andor
>> did you try with a smaller file ?
>> 
>> Enrico
>> 
>> Il giorno mar 29 ott 2019 alle ore 11:09 Enrico Olivelli - Diennea <
>> enrico.olive...@diennea.com> ha scritto:
>> 
>>> I would try to shrink the file to the minimum and add one line at a time.
>>> 
>>> With JDK8 we also had problems with Unlimited Strength policy stuff
>>> 
>>> Hope that helps
>>> 
>>> Enrico Olivelli
>>> MagNews Platform Development Manager @ Diennea – MagNews
>>> Tel.: (+39) 0546 066100 - Int. 125
>>> Viale G.Marconi 30/14 - 48018 Faenza (RA)
>>> 
>>> 
>>> www.diennea.com/en <
>>> 
>> https://www.diennea.com/en?utm_source=Firma_medium=Web
>> m_campaig
>>> n=Firma_Outlook>
>>> | www.magnews.com <
>>> 
>> https://www.magnews.com/?utm_source=Firma_medium=Web
>> _campaign=
>>> Firma_Outlook
>>>> 
>>> <
>>> https://www.linkedin.com/company/diennea---
>> magnews/?utm_source=Firma
>>> tm_medium=Web_campaign=Firma_Outlook
>>>> 
>>> <
>>> 
>> https://twitter.com/DienneaMagNews?utm_source=Firma_medium=
>> Web
>>> _campaign=Firma_Outlook
>>>> 
>>> <
>>> 
>> https://www.facebook.com/DienneaMagNews/?utm_source=Firma_
>> medium=W
>>> eb_campaign=Firma_Outlook
>>>> 
>>> 
>>> 
>>> 
>>> Il giorno 29/10/19, 10:55 "Andor Molnar"  ha scritto:
>>> 
>>>Thanks Enrico for the quick help.
>>> 
>>>Here’s my krb5.conf:
>>> 
>>>[libdefaults]
>>>default_realm = STREAMANALYTICS
>>>dns_lookup_kdc = false
>>>dns_lookup_realm = false
>>>ticket_lifetime = 86400
>>>renew_lifetime = 604800
>>>forwardable = true
>>>default_tgs_enctypes = aes256-cts aes128-cts des3-hmac-sha1
>>> arcfour-hmac des3-hmac-sha1 des-cbc-md5
>>>default_tkt_enctypes = aes256-cts aes128-cts des3-hmac-sha1
>>> arcfour-hmac des3-hmac-sha1 des-cbc-md5
>>>permitted_enctypes = aes256-cts aes128-cts des3-hmac-sha1
>>> arcfour-hmac
>>> des3-hmac-sha1 des-cbc-md5
>>>udp_preference_limit = 1
>>>kdc_timeout = 3000
>>>[realms]
>>>STREAMANALYTICS = {
>>>  kdc = ldap0.mydomain.com
>>>  admin_server = ldap0.mydomain.com
>>>}
>>>[domain_realm]
>>> 
>>>;
>>> 
>>>I wonder if the default encryption type settings could be the problem.
>>> I need to verify if it works with Java 8, because it might be a Java
>>> 11 or ZK 3.5 thing. Or both.
>>> 
>>>Andor
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 2019. Oct 29., at 8:42, Enrico Olivelli - Diennea <
>>> enrico.olive...@diennea.com> wr

Re: Kerberos login error: Message stream modified (41)

2019-10-29 Thread Andor Molnar
Thanks Enrico for the quick help.

Here’s my krb5.conf:

[libdefaults]
default_realm = STREAMANALYTICS
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts aes128-cts des3-hmac-sha1 arcfour-hmac 
des3-hmac-sha1 des-cbc-md5
default_tkt_enctypes = aes256-cts aes128-cts des3-hmac-sha1 arcfour-hmac 
des3-hmac-sha1 des-cbc-md5
permitted_enctypes = aes256-cts aes128-cts des3-hmac-sha1 arcfour-hmac 
des3-hmac-sha1 des-cbc-md5
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
STREAMANALYTICS = {
  kdc = ldap0.mydomain.com
  admin_server = ldap0.mydomain.com
}
[domain_realm]

;

I wonder if the default encryption type settings could be the problem. I need 
to verify if it works with Java 8, because it might be a Java 11 or ZK 3.5 
thing. Or both.

Andor





> On 2019. Oct 29., at 8:42, Enrico Olivelli - Diennea 
>  wrote:
> 
> Andor,
> this is a minimal krb5.conf file that is working from jdk8 to jdk13 and 
> ZooKeeper
> 
> maybe you can compare to your one and start dropping configuration lines that 
> are not needed.
> 
> Java is adding more and more capabilities to GSSAPI support and this 
> sometimes leads to behavior changes
> 
> 
> [libdefaults]
> default_realm = MYDOMAIN
> 
> [realms]
> MYDOMAIN  = {
>  kdc = kerberos1.mydomain.com
>  kdc = kerberos2. mydomain.com
>  kdc = kerberos3. mydomain.com
> }
> 
> 
> 
> Enrico Olivelli
> MagNews Platform Development Manager @ Diennea – MagNews
> Tel.: (+39) 0546 066100 - Int. 125
> Viale G.Marconi 30/14 - 48018 Faenza (RA)
> 
> 
> 
> Il giorno 28/10/19, 17:56 "Enrico Olivelli"  ha scritto:
> 
>Andor
> 
>Il lun 28 ott 2019, 17:44 Andor Molnar  ha scritto:
> 
>> Hi,
>> 
>> I’m facing the following error message when trying to run ZooKeeper 3.5.5
>> on Java 11 with Kerberos authentication:
>> 
>> 2019-10-28 16:30:04,811 INFO
>> org.apache.zookeeper.server.ServerCnxnFactory: Using
>> org.apache.zookeeper.server.NIOServerCnxnFactory as server connection
>> factory
>> 2019-10-28 16:30:04,823 INFO org.apache.zookeeper.common.X509Util: Setting
>> -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable
>> client-initiated TLS renegotiation
>> 2019-10-28 16:30:05,012 ERROR
>> org.apache.zookeeper.server.quorum.QuorumPeerMain: Unexpected exception,
>> exiting abnormally
>> java.io.IOException: Could not configure server because SASL configuration
>> did not allow the  ZooKeeper server to authenticate itself properly:
>> javax.security.auth.login.LoginException: Message stream modified (41)
>>at
>> org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:243)
>>at
>> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:646)
>>at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:148)
>>at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
>>at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>> …
>> 
>> zoo.cfg:
>> 
>> tickTime=2000
>> initLimit=10
>> syncLimit=5
>> 
>> 4lw.commands.whitelist=conf,cons,crst,dirs,dump,envi,gtmk,ruok,stmk,srst,srvr,stat,wchs,mntr,isro
>> dataDir=/var/lib/zookeeper
>> dataLogDir=/var/lib/zookeeper
>> clientPort=2181
>> maxClientCnxns=60
>> minSessionTimeout=4000
>> maxSessionTimeout=6
>> autopurge.purgeInterval=24
>> autopurge.snapRetainCount=5
>> quorum.auth.enableSasl=true
>> quorum.cnxn.threads.size=20
>> admin.enableServer=false
>> admin.serverPort=5181
>> server.1=cdf1-dc1.mydomain.com:3181:4181
>> server.2=cdf1-dc2.mydomain.com:3181:4181
>> server.3=cdf1-dc3.mydomain.com:3181:4181
>> leaderServes=yes
>> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
>> kerberos.removeHostFromPrincipal=true
>> kerberos.removeRealmFromPrincipal=true
>> quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
>> quorum.auth.learnerRequireSasl=true
>> quorum.auth.serverRequireSasl=true
>> 
>> java -version:
>> ——
>> openjdk version "11.0.4" 2019-07-16
>> OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
>> OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)
>> 
>> 
>> Has anyone seen this problem before?
>> What does the error message mean?
>> 
>> Unfortunately we swallow the original exception in ServerCnxnFactory a

Re: Why are errors stored in the transaction logs?

2019-10-29 Thread Andor Molnar
Hi Sylvain,

That should not be the case. Txns are not going to be created from read 
requests and not hit SyncRequestProcessor which is responsible for maintaining 
txn and snap logs.

The error handling you see in processTxn() is for ignoring failures during log 
replaying inside processTxn() method (in the catch branch). Introduced in 
ZOOKEEPER-1269 and it’s related to multis afaics.

Andor




> On 2019. Oct 15., at 16:33, Sylvain Wallez  wrote:
> 
> Hi ZooKeepers,
> 
> We had an issue recently with a service hammering ZK to read a path that 
> doesn't exist. As a result the transaction logs grew quickly because errors 
> are stored there.
> 
> Looking at DataTree.processTxn() which is called from 
> FileTxnSnapLog.restore() it seems that errors read in the transaction log are 
> essentially ignored.
> 
> So what is storing errors in the txnlog useful for? Would it make sense to 
> have a configuration flag to not store them?
> 
> Thanks,
> Sylvain
> 



Re: String inconsistency issue when running ZK with OpenJDK 10 on SKL machines

2019-10-28 Thread Andor Molnar
Here’s the JDK issue that Fangmin mentioned:

https://bugs.openjdk.java.net/browse/JDK-8207746

It’s a JDK 10 & 11 bug which has already been fixed since JDK11 b27.

Andor



> On 2019. Oct 28., at 8:00, Enrico Olivelli  wrote:
> 
> Fangmin,
> 
> Il lun 28 ott 2019, 02:23 Fangmin Lv  ha scritto:
> 
>> Hey everyone,
>> 
>> (Forgot to add subject in the previous email, resent with clear subject.)
>> 
>> I'd like to share some weird inconsistency bugs we saw recently on prod,
>> the root cause and potential fixes of it. It took us around a month to
>> investigate, reproduce and find out the root cause, hopefully the
>> informations here will help people avoid hitting this same potential issue.
>> 
>> [Trigger conditions and behavior]
>> 
>> The inconsistency issue only happened when running ZK with OpenJDK 10 on
>> SKL machines, and it's not because of bugs inside ZK but due to a
>> macro-assembly bug inside JDK.
>> 
>> And the behavior of the issues might be:
>> 
>> * NONODE returned when getData from a child exist when queried with
>> getChildren, and there is no delete issued
>> * NONODE error returned when try to create a child based on the parent node
>> just successfully created, and there is no delete issued
>> * No client is able to acquire the lock even though the previous session
>> who hold the lock already dead
>> 
>> [Root cause]
>> 
>> The direct cause of the misbehavior above is due to the key/value put into
>> the ZooKeeperServer.outstandingChangesForPath HashMap or the
>> DataNode.children HashSet are not visible to the future get or remove,
>> which caused the outstanding changes not visible when leader prepare the
>> following txns, or node being deleted but not removed from
>> DataNode.children.
>> 
>> And the 'bad' HashMap/HashSet behavior is not because of concurrency bugs
>> inside ZK, but due to a macro-assembly bug which is used to generate the
>> String.equals intrinsic assembly code in JDK 9 and 10. The bug was
>> introduced in JDK-8144771 when adding AVX-512 instructions support in JDK
>> to optimize the String.equals intrinsic performance with 512 bit vector op
>> support. Due to the bug, the String.equals method may return false result
>> when using high band of CPU register (xmm16 - xmm31) with non-empty stack
>> on SKL machines where AVX-512 is available.
>> 
>> The macro-assembly bug we hit is in vptest which is used in the
>> string_compare macro assembly code
>> <
>> http://hg.openjdk.java.net/jdk/jdk10/file/b09e56145e11/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l4933
>>> .
>> It uses add/sub instruction when saving/resuming register values
>> temporarily from stack, which will affect and distort the ZF (zero flag) in
>> FLAGS register from the previous test instruction.
>> 
>> For our case, if the key exist in the DataNode.children HashSet, the test
>> instruction result will be zero, ZF bit will be set to 1, if the RSP value
>> is not 0 (e.g stack is not empty) after addptr code here, then the ZF bit
>> will be changed to 0, so String.equals compare during removeNode will
>> return false result, and the key won't be removed.
>> 
>> There is bug reported in JDK-8207746, the behavior is different, we've
>> confirmed the issue by adding assembly code to log the issue in JDK 10.
>> 
>> [Solutions]
>> 
>> The possible mitigations are:
>> 
>> 1. Disabling the AVX-512 with JVM option -XX:UseAVX=2
>> 2. Using OpenJDK version higher than 10, which has fixed the issue in
>> JDK-8207746
>> 
>> Upgrading to OpenJDK 11+ is a better option, since 10 is not well
>> supported, and AVX-512 do helps improving performance.
>> 
>> We use JDK 10 due to SSL quorum socket close stall issue mentioned in
>> ZOOKEEPER-3384 , and
>> the SO_LINGER option is not honored in JDK 11. We've unblocked JDK 11 by
>> asynchronously closing the quorum socket, and we're upstreaming that in
>> ZOOKEEPER-3574 .
>> 
>> Thanks,
>> Fangmin
>> 
> 
> 
> Thank you for sharing this.
> Do you have any pointer to the jdk11 bugs? Is it solved in 12+?
> 
> I am running with jdk11-13 but without ssl, so never seen problems.
> 
> Enrico
> 
>> 



Re: Can I use czxid as unique sequence number?

2019-10-28 Thread Andor Molnar
Yes, I think you can do that.
zxid is currently an AtomicLong in the leader which we increment on every write 
request.

Andor


> On 2019. Oct 18., at 17:59, Vincent Ngan  wrote:
> 
> I want to determine the order of creation of a number of nodes without
> using ZooKeeper's sequence feature because I do not want the name of the
> nodes to be appended with a sequence number.
> 
> Regards,
> vngantk
> 
> On Thu, Oct 3, 2019 at 2:37 AM Enrico Olivelli  wrote:
> 
>> Can you give more details about your usecase?
>> 
>> Enrico
>> 
>> Il mer 2 ott 2019, 10:04 Vincent Ngan  ha scritto:
>> 
>>> Hi,
>>> 
>>> According to the definition of czxid as mentioned in the documentation,
>> can
>>> I assume that the czxid of every node is unique? If this is the case,
>> can I
>>> just use it for purpose of determining the unique sequence number of a
>>> node?
>>> 
>>> Regards,
>>> vngantk
>>> 
>> 



Kerberos login error: Message stream modified (41)

2019-10-28 Thread Andor Molnar
Hi,

I’m facing the following error message when trying to run ZooKeeper 3.5.5 on 
Java 11 with Kerberos authentication:

2019-10-28 16:30:04,811 INFO org.apache.zookeeper.server.ServerCnxnFactory: 
Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection 
factory
2019-10-28 16:30:04,823 INFO org.apache.zookeeper.common.X509Util: Setting -D 
jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS 
renegotiation
2019-10-28 16:30:05,012 ERROR 
org.apache.zookeeper.server.quorum.QuorumPeerMain: Unexpected exception, 
exiting abnormally
java.io.IOException: Could not configure server because SASL configuration did 
not allow the  ZooKeeper server to authenticate itself properly: 
javax.security.auth.login.LoginException: Message stream modified (41)
at 
org.apache.zookeeper.server.ServerCnxnFactory.configureSaslLogin(ServerCnxnFactory.java:243)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:646)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:148)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:123)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
…

zoo.cfg:

tickTime=2000
initLimit=10
syncLimit=5
4lw.commands.whitelist=conf,cons,crst,dirs,dump,envi,gtmk,ruok,stmk,srst,srvr,stat,wchs,mntr,isro
dataDir=/var/lib/zookeeper
dataLogDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=6
autopurge.purgeInterval=24
autopurge.snapRetainCount=5
quorum.auth.enableSasl=true
quorum.cnxn.threads.size=20
admin.enableServer=false
admin.serverPort=5181
server.1=cdf1-dc1.mydomain.com:3181:4181
server.2=cdf1-dc2.mydomain.com:3181:4181
server.3=cdf1-dc3.mydomain.com:3181:4181
leaderServes=yes
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
quorum.auth.kerberos.servicePrincipal=zookeeper/_HOST
quorum.auth.learnerRequireSasl=true
quorum.auth.serverRequireSasl=true

java -version:
——
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)


Has anyone seen this problem before?
What does the error message mean?

Unfortunately we swallow the original exception in ServerCnxnFactory and only 
log the message without stacktrace.

Thanks,
Andor




Removing Netty support from branch-3.4

2019-10-04 Thread Andor Molnar
Hi ZK users / devs,

ZooKeeper branch-3.4 is still on Netty 3 which is not maintained by the Netty 
team anymore. There’s no intention of updating it on our side, hence we’re 
planning to remove it from the codebase completely and ask existing users to 
upgrade to 3.5, if they still want to use Netty. 3.5 is a much better option 
anyway in various aspects: Netty 4 performs better, TLS support in both quorum 
and client communication, etc.

The default stack in 3.4 is NIO, so our gut feeling is that the impact on our 
existing users is low, however the most important effect of this change is 
probably the loss of encrypted client connections.

Please share your thoughts about this change and let us know if upgrading to 
3.5 is not possible in your use case.

Tracking Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-3568

Regards,
Andor





Re: Migrate from 3.4.x to 3.5.5

2019-09-07 Thread Andor Molnar
Hi Kathryn,

Are you a contributor of that .Net client? Is it officially supported
by Microsoft?
Would it make sense to merge it into the main repository at some point?

Andor


-Original Message-
From: Kathryn Hogg 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org 
Subject: RE: Migrate from 3.4.x to 3.5.5
Date: Thu, 5 Sep 2019 13:13:15 +

Thanks!  That buys me some time from having to fork ZookeeperNetEx and
do a 3.5.x port myself.  Additionally, it should allow me to use Kafka
with a 3.5.x zookeeper.
--Kathryn HoggSenior Technology Architect

-Original Message-From: Andor Molnar [mailto:an...@apache.org]
Sent: Thursday, September 5, 2019 12:42 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5
{External email message: This email is from an external source. Please
exercise caution prior to opening attachments, clicking on links, or
providing any sensitive information.}
Hi Kathryn,
That way should work without problems.
Andor

-Original Message-From: Kathryn Hogg Rep
ly-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org Subject: RE:
Migrate from 3.4.x to 3.5.5Date: Wed, 4 Sep 2019 15:07:11 +
Question about the opposite:  We have some C# clients using
ZookeeperNetEx which hasn't released a 3.5 version yet.  Will 3.4
clients work with 3.5 servers?--Kathryn HoggSenior Technology Architect
-Original Message-From: Andor Molnar [mailto:an...@apache.org]S
ent: Wednesday, September 4, 2019 9:52 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5 {External email message: This
email is from an external source. Please exercise caution prior to
opening attachments, clicking on links, or providing any sensitive
information.} Hi Zili, "If so, it seems upgrade client side force user
to upgrade server side also.”Yes, if client is upgraded _and_ user
wants to use a new feature in 3.5, then server side has to be upgraded
too. ;) Andor

> On 2019. Sep 3., at 13:20, Zili Chen  wrote:If
> so, it seems upgrade client side force user to upgrade server side
> also.




Re: Migrate from 3.4.x to 3.5.5

2019-09-04 Thread Andor Molnar
Hi Kathryn,

That way should work without problems.

Andor


-Original Message-
From: Kathryn Hogg 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org 
Subject: RE: Migrate from 3.4.x to 3.5.5
Date: Wed, 4 Sep 2019 15:07:11 +

Question about the opposite:  We have some C# clients using
ZookeeperNetEx which hasn't released a 3.5 version yet.  Will 3.4
clients work with 3.5 servers?
--Kathryn HoggSenior Technology Architect

-Original Message-From: Andor Molnar [mailto:an...@apache.org]
Sent: Wednesday, September 4, 2019 9:52 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5
{External email message: This email is from an external source. Please
exercise caution prior to opening attachments, clicking on links, or
providing any sensitive information.}
Hi Zili,
"If so, it seems upgrade client side force user to upgrade server side
also.”
Yes, if client is upgraded _and_ user wants to use a new feature in
3.5, then server side has to be upgraded too. ;)
Andor


> On 2019. Sep 3., at 13:20, Zili Chen  wrote:
> If so, it seems upgrade client side force user to upgrade server side
> also.




Re: Migrate from 3.4.x to 3.5.5

2019-09-04 Thread Andor Molnar
Hi Zili,

"If so, it seems upgrade client side force
user to upgrade server side also.”

Yes, if client is upgraded _and_ user wants to use a new feature in 3.5, then 
server side has to be upgraded too. ;)

Andor



> On 2019. Sep 3., at 13:20, Zili Chen  wrote:
> 
> If so, it seems upgrade client side force
> user to upgrade server side also.



ZooKeeper swag on RedBubble available

2019-09-03 Thread Andor Molnar
Hi ZooKeeper fans,

Get your favourite ZooKeeper swag today! :)

https://www.redbubble.com/people/comdev/works/40935715-apache-zookeeper?asc=u

Best,
Andor




Re: The current epoch, 7, is older than the last zxid, 8589935882

2019-08-28 Thread Andor Molnar
Thanks for the info, I’m still looking.
So, this is an Ubuntu packaged version of ZooKeeper.

Andor



> On 2019. Aug 27., at 14:13, Debraj Manna  wrote:
> 
> No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2
> 
> I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see
> zookeeper is getting started with 3.4.13 as shown below . The complete logs
> are placed in the below gist
> 
> https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9
> 
> nohup java -Dzookeeper.datadir.autocreate=false
> -Dzookeeper.log.dir=/var/log/zookeeper
> -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
> '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
> -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /etc/zookeeper/conf/zoo.cfg
> + sleep 1
> + echo STARTED
> STARTED
> 
> The content of zookeeper.log is placed in the below gist after the start
> 
> https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6
> 
> Let me know if you need any more logs.
> 
> On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar  wrote:
> 
>> I confirmed that the fix is included in 3.4.13. That’s why I asked if you
>> can see ‘updatingEpoch’ file in the data folder.
>> 
>> I don’t think the issue is not related, but I want to make sure that
>> you’re running the right version by verifying the beginning of ZK logs.
>> 
>> Andor
>> 
>> 
>> 
>>> On 2019. Aug 26., at 13:43, Debraj Manna 
>> wrote:
>>> 
>>> Below is the content of currentEpoch.tmp
>>> 
>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> currentEpoch.tmp
>>> 8support@platform2
>>> 
>>> Starting zookeeper logs are rolled over as the issue was there for some
>>> time. Will the current log with the node in this state help? Btw why do
>> you
>>> think this issue may not be related to zookeeper?
>>> 
>>> 
>>> 
>>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar  wrote:
>>> 
>>>> Hi Debraj,
>>>> 
>>>> The fix should be in all 3.4 versions from 3.4.6 onward, including
>> 3.4.13.
>>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
>> ZooKeeper.
>>>> 
>>>> Would you please share full startup logs of the failing node?
>>>> 
>>>> Regards,
>>>> Andor
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 2019. Aug 23., at 18:53, Debraj Manna 
>>>> wrote:
>>>>> 
>>>>> Can someone answer by below query?
>>>>> 
>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>>>> ZOOKEEPER-2354
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
>> say
>>>> it
>>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13
>>>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>>>> 
>>>>> 
>>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, 
>>>>> wrote:
>>>>> 
>>>>>> With the other two zookeeper servers running I stopped the zookeeper
>> in
>>>>>> the broken node and the deleted all the contents inside
>>>> /var/lib/zookeeper/version-2
>>>>>> and started the zookeeper back on the node. It is running fine now and
>>>> got
>>>>>> all the data from the other servers.
>>>>>> 
>>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>>> <https://issues.apache.org

Re: The current epoch, 7, is older than the last zxid, 8589935882

2019-08-26 Thread Andor Molnar
I confirmed that the fix is included in 3.4.13. That’s why I asked if you can 
see ‘updatingEpoch’ file in the data folder. 

I don’t think the issue is not related, but I want to make sure that you’re 
running the right version by verifying the beginning of ZK logs.

Andor



> On 2019. Aug 26., at 13:43, Debraj Manna  wrote:
> 
> Below is the content of currentEpoch.tmp
> 
> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
> 8support@platform2
> 
> Starting zookeeper logs are rolled over as the issue was there for some
> time. Will the current log with the node in this state help? Btw why do you
> think this issue may not be related to zookeeper?
> 
> 
> 
> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar  wrote:
> 
>> Hi Debraj,
>> 
>> The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.
>> 
>> Would you please share full startup logs of the failing node?
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>> 
>>> On 2019. Aug 23., at 18:53, Debraj Manna 
>> wrote:
>>> 
>>> Can someone answer by below query?
>>> 
>>> I am getting confused after going through ZOOKEEPER-1653
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>> it
>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>> 3.4.13
>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>> 
>>> 
>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, 
>>> wrote:
>>> 
>>>> With the other two zookeeper servers running I stopped the zookeeper in
>>>> the broken node and the deleted all the contents inside
>> /var/lib/zookeeper/version-2
>>>> and started the zookeeper back on the node. It is running fine now and
>> got
>>>> all the data from the other servers.
>>>> 
>>>> I am getting confused after going through ZOOKEEPER-1653
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
>> also?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna 
>>>> wrote:
>>>> 
>>>>> Thanks for replying.
>>>>> 
>>>>> What is the recommended way to remove a node and delete all data from
>> it
>>>>> and make it start fresh?
>>>>> 
>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, 
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> Sorry for so late reply.
>>>>>> If you have 3 servers you can nuke the broken one and make it start
>> from
>>>>>> scratch, it will join the cluster and then recover data from the other
>>>>>> servers
>>>>>> 
>>>>>> Try it in a staging env, not in production
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna  ha
>>>>>> scritto:
>>>>>> 
>>>>>>> The same has been asked in stackoverflow
>>>>>>> <
>>>>>>> 
>>>>>> 
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>>>> 
>>>>>>> also. But no response there also.
>>>>>>> 
>>>>>>> Anyone any thoughts on this one?
>>>>>>> 
>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
>> subharaj.ma...@gmail.com
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Posted wrong Jira link. I meant
>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>>>>>> let
>>>&g

Re: Migrate from 3.4.x to 3.5.5

2019-08-26 Thread Andor Molnar
Hi Zili,

There’s no migration guide available for 3.5, because it shouldn’t break any 
existing functionality and no need to upgrade the database either.

I’ve created a wiki page to collect upgrade experiences from users which could 
give you some hint if you’re facing problems: 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Upgrade+FAQ

You can always drop an email here too to get help.

Andor



> On 2019. Aug 26., at 14:12, Zili Chen  wrote:
> 
> Detailedly, in Flink community we try to bump ZooKeeper version from 3.4.10
> to 3.5.5 but without accurate idea about how it would break existing
> systems.
> Mainly we make use of the "client" of ZooKeeper.
> 
> 
> Zili Chen  于2019年8月26日周一 下午8:02写道:
> 
>> Hi,
>> 
>> Is there any migration guide for potentially breaking changes and how to
>> deal with them?
>> 
>> Best,
>> tison.
>> 



Re: The current epoch, 7, is older than the last zxid, 8589935882

2019-08-26 Thread Andor Molnar
Hi Debraj,

The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.

Would you please share full startup logs of the failing node?

Regards,
Andor




> On 2019. Aug 23., at 18:53, Debraj Manna  wrote:
> 
> Can someone answer by below query?
> 
> I am getting confused after going through ZOOKEEPER-1653
>  and ZOOKEEPER-2354
>  . The issues say it
> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13
> also. Can someone let me know if the issue is present in 3.4.13 also?
> 
> 
> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, 
> wrote:
> 
>> With the other two zookeeper servers running I stopped the zookeeper in
>> the broken node and the deleted all the contents inside 
>> /var/lib/zookeeper/version-2
>> and started the zookeeper back on the node. It is running fine now and got
>> all the data from the other servers.
>> 
>> I am getting confused after going through ZOOKEEPER-1653
>>  and ZOOKEEPER-2354
>>  . The issues say
>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 also?
>> 
>> 
>> 
>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna 
>> wrote:
>> 
>>> Thanks for replying.
>>> 
>>> What is the recommended way to remove a node and delete all data from it
>>> and make it start fresh?
>>> 
>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, 
>>> wrote:
>>> 
 Hello,
 Sorry for so late reply.
 If you have 3 servers you can nuke the broken one and make it start from
 scratch, it will join the cluster and then recover data from the other
 servers
 
 Try it in a staging env, not in production
 
 Enrico
 
 Il mar 20 ago 2019, 20:30 Debraj Manna  ha
 scritto:
 
> The same has been asked in stackoverflow
> <
> 
 https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>> 
> also. But no response there also.
> 
> Anyone any thoughts on this one?
> 
> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna  
> wrote:
> 
>> Posted wrong Jira link. I meant
>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
 let
> me
>> know what is the recommended way to recover the node?
>> 
>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
 acceptedEpoch
>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
 currentEpoch
>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> currentEpoch.tmp
>> 8support@platform2
>> 
>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
 subharaj.ma...@gmail.com>
>> wrote:
>> 
>>> Hi
>>> 
>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>>> after reboot of machine zookeeper is not starting and I am seeing
 the
> below
>>> errors in logs.
>>> 
>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
 Can
>>> someone let me if this is fixed in 3.4.13 or not as I can see the
 issue
>>> still open? Also can somone suggest what is the recommended way to
> recover
>>> the set-up ?
>>> 
>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
 Unable
>>> to load database on disk
>>> java.io.IOException: The current epoch, 7, is older than the last
 zxid,
>>> 34359738370
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>> at
>>> 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>>> Unexpected exception, exiting abnormally
>>> java.lang.RuntimeException: Unable to run quorum server
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>> at
>>> 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>> at
>>> 
> 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>> at
>>> 
> 
 

Re: jute.maxbuffer ignored for Client ssl connections ?

2019-08-16 Thread Andor Molnar
Hi,

It shouldn’t make any difference. Are u using NIO for non-ssl port 
communication?
SSL is only supported in Netty, so that could be a difference.

What’s the number after “Len error” and what’s your server-side jute.maxbuffer 
setting?

Andor



> On 2019. Aug 1., at 18:30, Jörn Franke  wrote:
> 
> Hi,
> 
> I have zookeeper 3.5.5 working fine, but there are some issues related to ssl 
> on secureClient port. If I connect to the normal unsecured client port there 
> is no issue, but if I connect to the secureClientPort then I receive the “Len 
> error” . I do increase the jute.maxbuffer and that works on the client port, 
> but not the secureClientPort. 
> Do you have an idea what could it be?
> 
> Thank you.
> 
> Best regards



Re: Issues with using ZooKeeper 3.5.5 together with Solr 8.2.0

2019-08-15 Thread Andor Molnar
https://issues.apache.org/jira/browse/ZOOKEEPER-3511
Andor

-Original Message-From: Patrick Hunt Reply-
To: user@zookeeper.apache.orgTo: UserZooKeeper <
user@zookeeper.apache.org>Subject: Re: Issues with using ZooKeeper
3.5.5 together with Solr 8.2.0Date: Mon, 5 Aug 2019 09:04:44 -0700
It sounds to me like a regression. We always had the properties format
for4lw, this (membership:) breaks that. I'd recommend fixing it in the
next3.5/3.6. ie. output the membership on a single line "membership:
 \n".Should be a pretty simple change - anyone interested in taking
it on?
Also agree that folks should move off 4lw to the new (better) options,
espas we plan to deprecate 4lw at some point.
Patrick
On Sun, Aug 4, 2019 at 12:15 PM Enrico Olivelli 
wrote:
> Il sab 3 ago 2019, 21:41 Shawn Heisey  ha
> scritto:
> > On 8/2/2019 10:33 AM, Patrick Hunt wrote:
> > > Right, it prints the membership of the quorum, see (for majority
> > > case
> > which
> > > is
> > > typical):org.apache.zookeeper.server.quorum.flexible.QuorumMaj#to
> > > String
> https://github.com/apache/zookeeper/blob/faa7cec71fddfb959a7d67923acffdb67d93c953/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java#L112
> > For our purposes (the Solr project) the output of the "conf" 4lw
> > commandis inconsistent, changing when there is a multi-server
> > ensemble.  All ofthe lines except the "membership: " one use an
> > equals sign as aseparator.  Our parsing code fails on that line
> > because there is noequals sign.
> > Whether or not the ZK project should consider this a bug is the
> > questionthat I am asking.
> > While getting to the bottom of that question, another one
> > arises:  Whoare the intended audiences of the "conf" 4lw
> > output?  If one of thoseaudiences is ZK itself, then the output of
> > the command probably willwork perfectly for that audience, as ZK
> > uses Java's "properties" API toread its config file, which means
> > that both = and : will work asseparators.
> > The current output also works great for a human audience.  Humans
> > arequite flexible.
> > The difficulty is machine-based parsers like the one in Solr, which
> > isvery simple and just splits lines on an equal sign.  How
> > muchconsistency can an audience like this expect?  I would
> > personally saythat the way "membership: " is output is a bug.  That
> > line probablyshould be entirely removed, or the colon could be
> > replaced with an equalsign.  I think that the line only makes sense
> > for a human audience, andthat audience probably doesn't really need
> > it.
> > An alternate path:  One statement in the documentation would remove
> > alldifficulty, without any code changes in ZK:
> > "The output from the conf 4lw command should be parsed by the
> > JavaProperties API for best results."
> 
> I think the best option is to switch to the Admin, HTTP + json based,
> as itis possible to integrate better with other automatic tools.We
> are working on docs for 3.6 (expecially the http admin server).We
> also added many new 'commands' to the admin API, which is supposed to
> bethe future for the mid/long term
> Enrico
> 
> 
> > If that statement is added, then Solr just needs to utilize
> > theProperties API, which is very easy to do, and all is well again.
> > So... I'm thinking we should open an issue in Jira, and then leave
> > it upto the ZK committers whether it's better to change the output
> > or adjustthe documentation.  I can supply a patch either way.  What
> > does thecommunity think?
> > Thanks,Shawn


Re: create or setData in transaction?

2019-08-14 Thread Andor Molnar
"it's said that ZooKeeper has a transaction mechanism”

I’m still confused with this. ZooKeeper doesn’t support transactions to my best 
knowledge. It has a `multi` operation feature, but that’s more like a bulk 
operation, not transaction.

"I want to tell ZooKeeper that the check and setData should be successful”

I don’t think you can do that. ZK has no check-and-set support either.

Maybe we should step back first and see what’s your use case exactly that 
you’re trying to solve with ZooKeeper. I suspect that you’re trying to follow 
the wrong approach or misusing ZooKeeper.

Have you checked our tutorial and recipes page? 
You can find some recommended usage patterns:
https://zookeeper.apache.org/doc/r3.5.5/recipes.html
https://zookeeper.apache.org/doc/r3.5.5/zookeeperTutorial.html

If that’s not enough, you could also try Curator which has even more built-in 
high level functionalities on top of basic ZK commands.

Andor



> On 2019. Aug 14., at 17:52, Zili Chen  wrote:
> 
> Hi Andor,
> 
> Thanks for your attention.
> 
> The problem is that in concurrent scenario zk.setData() could still failed
> if there is another thread delete the node. I know with proper lock strategy
> and ownership separation this can be avoid but it's said that ZooKeeper has
> a transaction mechanism so I'd like to see whether I can make use of it.
> 
> There is where I turn to
> 
> zk.multi(Op.check(path1), Op.setData(path2, data)); // path1 == or != path2
> is irrelevant
> 
> when the existence of a mark node(path1) guarded a condition and I want to
> make
> sure that setData successes only if the mark node exist. If I check the
> existence
> first and commit setData, a remove to the node could break the guard. In
> other
> words, I want to tell ZooKeeper that the check and setData should be
> successful
> committed or fail to be committed atomically.
> 
> Best,
> tison.
> 
> 
> Andor Molnar  于2019年8月14日周三 下午11:12写道:
> 
>> Hi Zili,
>> 
>> There’s no such functionality in ZooKeeper as far as I’m concerned. I
>> think your multi example (zk.multi(Op.check(path), Op.setData(path, data)))
>> is already a usage pattern which multi is not designed to support.
>> 
>> Why do you need to do this in “transactions” (there’s no transaction in
>> ZK)?
>> 
>> In Java you can do:
>> 
>> try {
>>  zk.create();
>> } catch (NodeExistsException e) {
>>  // swallow exception
>> }
>> zk.setData();
>> …
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>> 
>>> On 2019. Aug 6., at 14:44, Zili Chen  wrote:
>>> 
>>> Hi Enrico,
>>> 
>>> Thanks for your reply.
>>> 
>>>> In this case usually you use conditional setData, using the 'version' of
>>>> thr znode
>>> 
>>> 
>>> what if the caller has no idea on whether the node exist?(see also
>>> my if-else pseudo-code above.)
>>> 
>>> IIRC if we call `setData` on a non-exist path a NoNodeException
>>> will be thrown.
>>> Best,
>>> tison.
>>> 
>>> 
>>> Enrico Olivelli  于2019年8月6日周二 下午8:27写道:
>>> 
>>>> Il mar 6 ago 2019, 13:47 Zili Chen  ha scritto:
>>>> 
>>>>> Any ideas?
>>>>> 
>>>>> 
>>>>> Zili Chen  于2019年7月29日周一 上午11:12写道:
>>>>> 
>>>>>> Hi ZooKeepers,
>>>>>> 
>>>>>> Currently our transaction mechanism supports doing
>>>>>> create/setData/checkExist/delete in transaction. However, taking this
>>>>>> scenario into consideration, we want to put data in path "/path" but
>>>>>> don't know whether the znode exists or not. Let's say we program as
>>>>>> below
>>>>>> 
>>>>>> if (zk.exist(path)) {
>>>>>> zk.setData(path, data);
>>>>>> } else {
>>>>>> zk.create(path, data);
>>>>>> }
>>>>> 
>>>> 
>>>> Do you need to perform other ops in the same transaction?
>>>> In this case usually you use conditional setData, using the 'version' of
>>>> thr znode
>>>> 
>>>> 
>>>> Enrico
>>>> 
>>>>> 
>>>>>> if we want to do the check and "put" in transaction, it would be like
>>>>>> 
>>>>>> zk.multi(Op.check(path), Op.setData(path, data));
>>>>>> 
>>>>>> but we cannot add a "else" branch. ZooKeeper's transaction would all
>>>>>> success or fail.
>>>>>> 
>>>>>> Is there any way to do an "if-else" transaction?
>>>>>> 
>>>>>> Best,
>>>>>> tison.
>>>>>> 
>>>>> 
>>>> 
>> 
>> 



Re: create or setData in transaction?

2019-08-14 Thread Andor Molnar
Hi Zili,

There’s no such functionality in ZooKeeper as far as I’m concerned. I think 
your multi example (zk.multi(Op.check(path), Op.setData(path, data))) is 
already a usage pattern which multi is not designed to support.

Why do you need to do this in “transactions” (there’s no transaction in ZK)?

In Java you can do:

try {
  zk.create();
} catch (NodeExistsException e) {
  // swallow exception
}
zk.setData();
…

Regards,
Andor




> On 2019. Aug 6., at 14:44, Zili Chen  wrote:
> 
> Hi Enrico,
> 
> Thanks for your reply.
> 
>> In this case usually you use conditional setData, using the 'version' of
>> thr znode
> 
> 
> what if the caller has no idea on whether the node exist?(see also
> my if-else pseudo-code above.)
> 
> IIRC if we call `setData` on a non-exist path a NoNodeException
> will be thrown.
> Best,
> tison.
> 
> 
> Enrico Olivelli  于2019年8月6日周二 下午8:27写道:
> 
>> Il mar 6 ago 2019, 13:47 Zili Chen  ha scritto:
>> 
>>> Any ideas?
>>> 
>>> 
>>> Zili Chen  于2019年7月29日周一 上午11:12写道:
>>> 
 Hi ZooKeepers,
 
 Currently our transaction mechanism supports doing
 create/setData/checkExist/delete in transaction. However, taking this
 scenario into consideration, we want to put data in path "/path" but
 don't know whether the znode exists or not. Let's say we program as
 below
 
 if (zk.exist(path)) {
  zk.setData(path, data);
 } else {
  zk.create(path, data);
 }
>>> 
>> 
>> Do you need to perform other ops in the same transaction?
>> In this case usually you use conditional setData, using the 'version' of
>> thr znode
>> 
>> 
>> Enrico
>> 
>>> 
 if we want to do the check and "put" in transaction, it would be like
 
 zk.multi(Op.check(path), Op.setData(path, data));
 
 but we cannot add a "else" branch. ZooKeeper's transaction would all
 success or fail.
 
 Is there any way to do an "if-else" transaction?
 
 Best,
 tison.
 
>>> 
>> 



Re: Clarification: SSL Client: Need of keystore?

2019-08-14 Thread Andor Molnar
Hi Jorn,

I cannot test this unfortunately, because I don’t have a working Kerberos 
environment at the moment. If you comment out keystore.location, ZooKeeper 
won’t start, because it’s unable to build the TrustManager.

Would you please try to create a fake (possibly empty) truststore and see how 
it goes?

Andor



> On 2019. Jul 30., at 20:49, Jörn Franke  wrote:
> 
> Hi,
> 
> I have a kerberized Zookeeper cluster and would like to add SSL on the
> client side and to the quorum.
> 
> So far the server configuration is clear. However, according to
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide
> 
> I need to specify on the client side
> zookeeper.ssl.keyStore.location="/path/to/your/keystore"
> zookeeper.ssl.keyStore.password="keystore_password"
> zookeeper.ssl.trustStore.location="/path/to/your/truststore"
> zookeeper.ssl.trustStore.password="truststore_password"
> 
> I do understand the need to provide a truststore, but why does the client
> need a keystore. As far as I understood the keystore is only needed for
> X509 authentication, but I use the Kerberos authentication.
> 
> Does it mean the SSL client connection requires X509 authentication and
> Kerberos is not possible?
> Can you please clarify?
> 
> thank you.
> 
> best regards



Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

2019-08-14 Thread Andor Molnar
After some digging it turned out that this is an outstanding issue in 3.4->3.5 
upgrade. I’ve found the following e-mail thread about it:
https://markmail.org/thread/rbhzbro6nszypwwp

…and an open Jira:
https://issues.apache.org/jira/browse/ZOOKEEPER-3056

Unfortunately, patch is still not available, but essentially the solution is to 
force ZooKeeper to create snapshot file somehow. Sorry, Admin interface is not 
available in 3.4, it was my bad to recommend it.

In the last Jira comment there’s a workaround:
To perform an upgrade (3.4 -> 3.5):
• download the "snapshot.0" file attached
• copy it to the versioned directory (e.g. "version-2") within your 
data directory (parameter "dataDir" in your config - this is the directory 
containing the "myid" file for a peer)
• restart the peer
• upgrade the peer (this can be combined with the above step if you 
like)

Would you please give it a try?

Andor




> On 2019. Aug 14., at 10:44, Andor Molnar  wrote:
> 
> Hi Jorn,
> 
> Thanks for reaching out to us, this is a very important exercise to make sure 
> the upgrade path works as expected.
> 
> - Please do an `ls -al` in your data dir to make sure you have valid snapshot 
> files.
> - It would be also useful to expose the Admin port (8080/tcp by default) and 
> check the output of `lastSnapshotCommand`.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
>> On 2019. Aug 14., at 7:13, Jörn Franke  wrote:
>> 
>> For me the issue occurred only in standalone mode. With the ensemble I 
>> simply cleared the data directory and it received the zookeeper data from 
>> the quorum. 
>> 
>>> Am 13.08.2019 um 15:42 schrieb Koen De Groote :
>>> 
>>> I would also like to know if this is possible.
>>> 
>>> From going over the github page, it seems there is a JMX method to force
>>> the creation of a snapshot. Yet the docker image is configured as such that
>>> a port will never be assigned to the JMX process.
>>> 
>>> Is there any way to bypass this?
>>> 
>>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke  wrote:
>>>> 
>>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>>> check I think the snapshot count is set to 1 in the cfg
>>>> 
>>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli :
>>>>> 
>>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>>> jornfra...@gmail.com>
>>>>> ha scritto:
>>>>> 
>>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>>> it
>>>>>> is missing then I wonder why it was missing. There was no crash or
>>>> whatever
>>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>>> from the log files. However, then I wonder why it does not create one.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I remember now that some other user, I think Sijie, reported a similar
>>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>>> 3.5
>>>>> if no snapshot is present.
>>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>>> then upgrade
>>>>> 
>>>>> Enrico
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han  wrote:
>>>>>> 
>>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>>> 
>>>>>>> If there are local snapshot files and the files are valid, then it's a
>>>>>> bug
>>>>>>> that server fails to load them.
>>>>>>> 
>>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>> 
>>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>>> despite
>>>>>>>> that it means some extra work (for uncritical lab environments)

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

2019-08-14 Thread Andor Molnar
Hi Jorn,

Thanks for reaching out to us, this is a very important exercise to make sure 
the upgrade path works as expected.

- Please do an `ls -al` in your data dir to make sure you have valid snapshot 
files.
- It would be also useful to expose the Admin port (8080/tcp by default) and 
check the output of `lastSnapshotCommand`.

Regards,
Andor





> On 2019. Aug 14., at 7:13, Jörn Franke  wrote:
> 
> For me the issue occurred only in standalone mode. With the ensemble I simply 
> cleared the data directory and it received the zookeeper data from the 
> quorum. 
> 
>> Am 13.08.2019 um 15:42 schrieb Koen De Groote :
>> 
>> I would also like to know if this is possible.
>> 
>> From going over the github page, it seems there is a JMX method to force
>> the creation of a snapshot. Yet the docker image is configured as such that
>> a port will never be assigned to the JMX process.
>> 
>> Is there any way to bypass this?
>> 
>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke  wrote:
>>> 
>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>> check I think the snapshot count is set to 1 in the cfg
>>> 
 Am 30.07.2019 um 08:06 schrieb Enrico Olivelli :
 
 Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>> jornfra...@gmail.com>
 ha scritto:
 
> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>> it
> is missing then I wonder why it was missing. There was no crash or
>>> whatever
> and 3.4.14 works without issue, but of course it could have loaded them
> from the log files. However, then I wonder why it does not create one.
> 
 
 
 
 I remember now that some other user, I think Sijie, reported a similar
 problem some month ago, that it is not possible to upgrade from 3.4 to
>>> 3.5
 if no snapshot is present.
 IIRC The fix was to force the creation of at least one snapshot file and
 then upgrade
 
 Enrico
 
 
> 
> On Mon, Jul 29, 2019 at 11:45 PM Michael Han  wrote:
> 
 I just wonder why it does not find a valid snapshot.
>> 
>> If there are local snapshot files and the files are valid, then it's a
> bug
>> that server fails to load them.
>> 
 Is it because the format changed in 3.5.5 compared to 3.4.14?
>> 
>> Not I am aware of. There are some format changes (added compression
>> support) in master branch, but that's not shipped with 3.5.5.
>> 
>> 
>> 
>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke 
> wrote:
>> 
>>> ok, then it affects basically all standalone nodes? This is fine,
> despite
>>> that it means some extra work (for uncritical lab environments).
>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>> after
>>> downgrading back). There is no issue with disk space and there are no
>>> 0
>>> byte files.  I just wonder why it does not find a valid snapshot. Is
>>> it
>>> because the format changed in 3.5.5 compared to 3.4.14?
>>> 
>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han  wrote:
>>> 
>> java.io.IOException: No snapshot found, but there are log entries.
 Something is broken!
 
 This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>> to
 end up with potential inconsistent state across the ensemble when
 recovering from empty snapshot.
 
 To continue upgrade, just delete all txn log files and let the node
>> sync
 the snapshot from the quorum.
 
 
 On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli > 
 wrote:
 
> Il lun 29 lug 2019, 22:32 Jörn Franke  ha
>>> scritto:
> 
>> It also seems that 3.5.5 does not attempt to read all of the
>> logfiles
 (I
>> have to still confirm), but the two it reads exist, it has access
>> and
> they
>> are much more than 0 byte
>> 
> 
> We should have the stackstace of the EOFException.
> 
> Anyone on this list has a better idea?
> 
> Enrico
> 
> 
>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> jornfra...@gmail.com
>>> 
> wrote:
>> 
>>> (of course i do not run them at the same time)
>>> 
>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>> jornfra...@gmail.com
 
>> wrote:
>>> 
 thank you for the quick reply. They read from the same disk
>> paths
 and
 have the same access rights (in fact the RHEL service executes
>>> them
 as
>> the
 same specific user).
 
 On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
 eolive...@gmail.com
>> 

Re: Zookeeper latency calculation

2019-07-17 Thread Andor Molnar
If I recall correctly avg_latency is an int, not float.I remember
someone wanted to replace it with a float sometime recently.
Correct me if I'm wrong.
Andor


-Original Message-From: Norbert Kalmar <
nkal...@cloudera.com.INVALID>Reply-To: user@zookeeper.apache.orgTo: 
user@zookeeper.apache.orgSubject: Re: Zookeeper latency
calculationDate: Wed, 17 Jul 2019 09:27:35 +0200
Hi Ram,
ZooKeeper is very fast if deployed according to recommendations (nodes
onthe same network, directly connected). It's possible it gives 0
latency onavg, although usually it's a bit higher.I can recommend
Patrick's smoke test if you wan't to test performance.Especially zk-
smoketest and zk-latencies.py. It has a good readme:
https://github.com/phunt/zk-smoketest

But "mntr" command is pretty much the easiest tool out of the
box.Especially if you are on 3.4.x
Regards,Norbert
On Tue, Jul 16, 2019 at 7:05 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:
> Hi,
> I am trying to understand how zookeeper latency calculated, mntr
> commandalways give avg_latency "0", can some one help how to
> calculate avg requestlatency in zookeeper?
> 
> Thanks,Ram


Re: ZK 3.5.5 : SecureClientPort and Server Specs

2019-07-10 Thread Andor Molnar
Marked the more recent ticket as duplicate.
Thanks Fred.

Andor



> On 2019. Jul 9., at 16:22, Fred Eisele  wrote:
> 
> There is already an issue .
> https://issues.apache.org/jira/browse/ZOOKEEPER-3166



Re: ZK 3.5.5 : SecureClientPort and Server Specs

2019-07-02 Thread Andor Molnar
Got it. Thanks Alex. I’m not familiar with dynamic config unfortunately.
I agree that we need to open a Jira for it.

Andor



> On 2019. Jul 2., at 6:55, Alexander Shraer  wrote:
> 
> I think that Fred is correct - secureClientPort and secureClientPortAddress
> were not made part of the dynamic configuration (yet ?), so unlike other
> parameters, they are static.
> Fred, perhaps you could open a Jira to ask for this feature ?
> 
> Thanks,
> Alex
> 
> On Mon, Jul 1, 2019 at 2:58 PM Andor Molnar  wrote:
> 
>> Hi Fred,
>> 
>> I don’t think this server spec is accurate.
>> clientPort and clientPortAddress as well as secureClientPort and
>> secureClientPortAddress are defined in the main section of config file, not
>> within Cluster Options:
>> 
>> 
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#sc_configuration
>> <
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#sc_configuration
>>> 
>> 
>> e.g. You should have something like:
>> 
>> clientPort=2181
>> clientPortAddress=127.0.0.1
>> secureClientPort=1181
>> secureClientPortAddress=…
>> 
>> server.1=…
>> server.2=…
>> 
>> In your zoo.cfg config file.
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>>> On 2019. Jun 19., at 17:28, Fred Eisele 
>> wrote:
>>> 
>>> The server specification is ...
>>> server. = ::[:role];[>> address>:]
>>> The clientPort and clientPortAddress are accomodated but I do not see a
>>> provision for secureClientPort.
>>> I presume this means it is a static parameter as before?
>> 
>> 



Re: Reconfigure SSLQuorum

2019-07-01 Thread Andor Molnar
Hi Tyler,

Sorry, looks like we missed to add this feature to the documentation.
It should be supported from 3.5.5 according to this PR:

https://github.com/apache/zookeeper/pull/737 


You can find some useful information in the description.
Please let me know if it works for you as expected.

Thanks,
Andor




> On 2019. Jun 14., at 23:13, Tyler Lubeck  wrote:
> 
>  



Re: client session expired after timeout after errors and warning logs in zk server logs

2019-07-01 Thread Andor Molnar
Hi Prashant,

It’s quite difficult to track what could have happened from your log file 
snippet. Look’s like there’re multiple things going on at the same time: 
dynamic reconfig, leader election, network issues?

However you’re using an early alpha version of ZooKeeper which is not 
recommended for production use. I suggest to upgrade to a recent stable release 
(3.5.5) and see if the problem still occurs.

Regards,
Andor




> On 2019. Jun 20., at 13:44, prashantkumar dhotre 
>  wrote:
> 
> Hi
> We use 3.5.1-alpha version.
> We are seeing session expiry issue in VM set up.
> This is running in replicated more (two servers + node mastership as one
> vote for quorum).
> we see client session expired after session timeout (of 10 sec).
> This connection was to local zk server.
> session timeout is 10 sec.
> 
> This session got established at 17:40:18 and ZK server expired this at
> 17:40:57, after 39 seconds of establishment.
> 
> in between this time, i see few errors and warnings in zookeeper server
> logs (as shown below).
> 
> I see below errors/warning in between this time before session expiry.
> 
> This issue is not very easy to replicate , so far we have seen only twice.
> 
> Could you please help me identify root cause and let me know if this is
> fixed in later release ?
> 
> Thanks,
> 
> Prashant
> 
> 
> Logs:
> 
> 
> 
> ZK server errors and warnings in ZK server logs:
> 
> 
> 
> 2711:2019-06-05 17:40:38,766 [myid:2147483652] - INFO  [
> 0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
> request /128.0.0.5:38020
> 
> 2724:2019-06-05 17:40:38,931 [myid:2147483652] - WARN
> [RecvWorker:2147483653:QuorumCnxManager$RecvWorker@917] - Connection broken
> for id 2147483653, my id = 2147483652, error =
> 
> 2725:java.io.EOFException
> 
> 2726:   at java.io.DataInputStream.readInt(Unknown Source)
> 
> 2727:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:902)
> 
> 2728:2019-06-05 17:40:38,943 [myid:2147483652] - WARN
> [RecvWorker:2147483653:QuorumCnxManager$RecvWorker@920] - Interrupting
> SendWorker
> 
> 2730:2019-06-05 17:40:38,950 [myid:2147483652] - WARN
> [SendWorker:2147483653:QuorumCnxManager$SendWorker@834] - Interrupted while
> waiting for message on queue
> 
> 2731:java.lang.InterruptedException
> 
> 2732:   at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
> Source)
> 
> 2733:   at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
> Source)
> 
> 2734:   at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
> 
> 2735:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:986)
> 
> 2736:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:63)
> 
> 2737:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:822)
> 
> 2738:2019-06-05 17:40:38,965 [myid:2147483652] - WARN
> [SendWorker:2147483653:QuorumCnxManager$SendWorker@843] - Send worker
> leaving thread  id 2147483653 my id = 2147483652
> 
> 2740:2019-06-05 17:40:38,978 [myid:2147483652] - INFO  [
> 0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
> request /128.0.0.5:38022
> 
> 2741:2019-06-05 17:40:38,986 [myid:2147483652] - INFO  [
> 0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
> request /128.0.0.5:38024
> 
> 2742:2019-06-05 17:40:38,987 [myid:2147483652] - WARN
> [RecvWorker:2147483653:QuorumCnxManager$RecvWorker@920] - Interrupting
> SendWorker
> 
> 2743:2019-06-05 17:40:39,017 [myid:2147483652] - ERROR
> [SendWorker:2147483653:QuorumCnxManager$SendWorker@810] - Failed to send
> last message. Shutting down thread.
> 
> 2744:java.net.SocketException: Socket closed
> 
> 2745:   at java.net.SocketOutputStream.socketWrite(Unknown Source)
> 
> 2746:   at java.net.SocketOutputStream.write(Unknown Source)
> 
> 2747:   at java.io.DataOutputStream.writeInt(Unknown Source)
> 
> 2748:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:779)
> 
> 2749:   at
> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:806)
> 
> 2750:2019-06-05 17:40:39,018 [myid:2147483652] - WARN
> [SendWorker:2147483653:QuorumCnxManager$SendWorker@843] - Send worker
> leaving thread  id 2147483653 my id = 2147483652
> 
> 2758:2019-06-05 17:40:39,125 [myid:2147483652] - INFO
> [LearnerHandler-/128.0.0.5:49888:LearnerHandler@385] - Follower sid:
> 2147483653 not in the current config 1
> 
> 2759:2019-06-05 17:40:39,130 [myid:2147483652] - INFO
> [LearnerHandler-/128.0.0.5:49888:LearnerHandler@683] - Synchronizing with
> Follower sid: 2147483653 maxCommittedLog=0x10e04 minCommittedLog=0x1000
> 
> 00c10 lastProcessedZxid=0x10e04 peerLastZxid=0x0
> 
> 2760:2019-06-05 17:40:39,132 [myid:2147483652] - WARN
> 

Re: ZK 3.5.5 : SecureClientPort and Server Specs

2019-07-01 Thread Andor Molnar
Hi Fred,

I don’t think this server spec is accurate.
clientPort and clientPortAddress as well as secureClientPort and 
secureClientPortAddress are defined in the main section of config file, not 
within Cluster Options:

https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#sc_configuration 


e.g. You should have something like:

clientPort=2181
clientPortAddress=127.0.0.1
secureClientPort=1181
secureClientPortAddress=…

server.1=…
server.2=…

In your zoo.cfg config file.

Regards,
Andor



> On 2019. Jun 19., at 17:28, Fred Eisele  wrote:
> 
> The server specification is ...
> server. = ::[:role];[ address>:]
> The clientPort and clientPortAddress are accomodated but I do not see a
> provision for secureClientPort.
> I presume this means it is a static parameter as before?



Re: Updated SSL guide fro 3.5.5

2019-05-22 Thread Andor Molnar
Hi Ram,

We have a ZooKeeper SSL User Guide on the wiki which contains the
client-server howto currently.
I need to update it with quorum TLS, but that would be probably just
copying over the admin guide.

https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide#ZooKeeperSSLUserGuide-Quorum

Andor



On Wed, May 22, 2019 at 11:37 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Hi,
>
> I did find doc for quorum tls
> https://zookeeper.apache.org/doc/r3.5.5/zookeeperAdmin.html#Quorum+TLS but
> looking for client - server tls (for existing client server upgrade)
>
> Ram
>
> On Wed, May 22, 2019 at 1:54 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Hi,
> >
> > Since 3.5.5 is out is there any updated guide to configure SSL both for
> > server-server and client-server?
> >
> > Thanks,
> > Ram
> >
>


Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

2019-05-22 Thread Andor Molnar
Hi Qian,

Which version of ZooKeeper are you using?
Would you please share the config files and leader logs too?
Also looks like you're trying to connect with an older client:
>>> Connection request from old client /10.249.255.10:42306; will be
dropped if server is in r-o mode

Andor



On Wed, May 22, 2019 at 2:52 AM Qian Zhang  wrote:

> Anyone has any ideas?
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 19, 2019 at 6:15 PM Qian Zhang  wrote:
>
> > Hi,
> >
> > I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> > connected due to a hardware issue, and then I found the 4 followers just
> > shutdown, here is the logs:
> >
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >> following the leader
> >>   java.net.SocketTimeoutException:
> >> Read timed out
> >> at
> >> java.net.SocketInputStream.socketRead0(Native Method)
> >> at
> >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >> at
> >> java.net.SocketInputStream.read(SocketInputStream.java:171)
> >> at
> >> java.net.SocketInputStream.read(SocketInputStream.java:141)
> >> at
> >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >> at
> >> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >> at
> >> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >> at
> >> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >> at
> >>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >> at
> >> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> >> at
> >>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >> at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
> >> Accepted socket connectio
> >> n from /10.249.255.10:42306
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
> >> Connection request from old cl
> >> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
> Client
> >> attempting to establish
> >>  new session at /10.249.255.10:42306
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> >> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> >> unrecoverable error, from threa
> >> d : FollowerRequestProcessor:1
> >>   java.net.SocketException: Socket
> >> closed
> >> at
> >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >> at
> >> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >> at
> >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >> at
> >> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >> at
> >> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> >> at
> >> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
> >> at
> >>
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
> called
> >>   java.lang.Exception: shutdown
> >> Follower
> >> at
> >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >> at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
> >
> >
> > I am confused why all followers shutdown in this case which makes the
> > whole ZooKeeper unusable for a short period, shouldn't they elect a new
> > leader instead? Thanks!
> >
> >
> > Regards,
> > Qian Zhang
> >
>


[CVE-2019-0201] Information disclosure vulnerability in Apache ZooKeeper

2019-05-20 Thread Andor Molnar
CVE-2019-0201: Information disclosure vulnerability in Apache ZooKeeper
 
Severity: Critical
 
Vendor: The Apache Software Foundation
 
Versions Affected: ZooKeeper prior to 3.4.14, ZooKeeper 3.5.0-alpha through 
3.5.4-beta. The unsupported ZooKeeper 1.x through 3.3.x versions may be also 
affected.
 
Description: ZooKeeper’s getACL() command doesn’t check any permission when 
retrieves the ACLs of the requested node and returns all information contained 
in the ACL Id field as plaintext string. DigestAuthenticationProvider overloads 
the Id field with the hash value that is used for user authentication. As a 
consequence, if Digest Authentication is in use, the unsalted hash value will 
be disclosed by getACL() request for unauthenticated or unprivileged users.
 
Mitigation: Use an authentication method other than Digest (e.g. Kerberos) or 
upgrade to 3.4.14 or later (3.5.5 or later if on the 3.5 branch).
 
Credit: This issue was identified by Harrison Neal  
PatchAdvisor, Inc.
 
References: https://issues.apache.org/jira/browse/ZOOKEEPER-1392
 



[ANNOUNCE] Apache ZooKeeper 3.5.5

2019-05-20 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.5.5

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
https://zookeeper.apache.org/releases.html

ZooKeeper 3.5.5 Release Notes are at:
https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team




Re: Deprecated CSVInputArchive and XMLInputArchive

2019-05-09 Thread Andor Molnar
Hi Zili,

I'm surely not the best person to talk about ZooKeeper history, but as far
as I know these 2 input archives are not actively maintained and I've never
seen them used in production.
We probably don't have test coverage for them either, so keeping them in
the codebase could be questionable.

Regards,
Andor




On Sat, Apr 13, 2019 at 7:00 PM Zili Chen  wrote:

> Hi,
>
> I'm not sure whether user list is a proper place but seems dev list
> is filled of notifications.
>
> During an investigation of the possibility that ZooKeeper support multi
> serialization frameworks, I found that in jute, CSVInputArchive and
> XMLInputArchive are never in use. I wonder the story of these
> implementations and whether they are still valid.
>
> Best,
> tison.
>


[ANNOUNCE] Apache ZooKeeper 3.4.14

2019-04-02 Thread Andor Molnar
The Apache ZooKeeper team is proud to announce Apache ZooKeeper version 3.4.14

ZooKeeper is a high-performance coordination service for distributed
applications. It exposes common services - such as naming,
configuration management, synchronization, and group services - in a
simple interface so you don't have to write them from scratch. You can
use it off-the-shelf to implement consensus, group management, leader
election, and presence protocols. And you can build on it for your
own, specific needs.

For ZooKeeper release details and downloads, visit:
http://zookeeper.apache.org/releases.html

ZooKeeper 3.4.14 Release Notes are at:
http://zookeeper.apache.org/doc/r3.4.14/releasenotes.html

We would like to thank the contributors that made the release possible.

Regards,

The ZooKeeper Team



Re: TLS/SSL support to encrypt traffic between zookeeper nodes

2019-03-20 Thread Andor Molnar
We're in the middle of creating a maintenance release 3.4.14. Once it's
ready, I'll cut 3.5.5.
Hopefully in the next few weeks.

Andor



On Wed, Mar 20, 2019 at 7:32 AM Kaushal Shriyan 
wrote:

> Hi Andor,
>
> Thanks Andor for the email.  Any dates planned to release 3.5.5 version? I
> do not see 3.5.5 version in https://zookeeper.apache.org/releases.html
>
> Thanks in Advance and i look forward to hearing from you.
>
> Best Regards,
>
> Kaushal
>


Re: TLS/SSL support to encrypt traffic between zookeeper nodes

2019-03-19 Thread Andor Molnar
Hi Kaushal,

Yes, it will be released in 3.5.5 soon.

Andor



On Tue, Mar 19, 2019 at 10:40 AM Kaushal Shriyan 
wrote:

> Hi,
>
> Is there a TLS/SSL support to encrypt traffic between zookeeper nodes
> (internode communication)?
>
> Thanks in Advance and i look forward to hearing from you.
>
> Best Regards,
>
> Kaushal
>


Re: RR DNS name instead of list of server

2019-02-12 Thread Andor Molnar
Not sure what do you mean by 'static'?
ZK instance cannot change myid, it's tight to the database.

Andor


On Tue, Feb 12, 2019 at 5:18 PM rammohan ganapavarapu <
rammohanga...@gmail.com> wrote:

> Andor,
>
> Thanks you, do we have to have a static myid? any alternatives to it?
>
> Ram
>
> On Tue, Feb 12, 2019 at 3:44 AM Andor Molnar 
> wrote:
>
> > Hi Ram / Alan,
> >
> > I quite like the idea of implementing some kind of autoconfiguration for
> > ZooKeeper, because currently it's entirely based on static config files
> > which is not 100% cloud-friendly. Starting the project with an initial
> > support for EC2 instances based on Alan's approach would be awesome.
> > There's no concept of "seed nodes" in ZK, like Cassandra, e.g. neither
> > clients nor servers are able to learn cluster topology from each other
> > (that could be another improvement). In order to start a participant we
> > have to provide "myid" (from instance tag), server IP addresses
> > (autoscaling group), election and quorum port numbers and participant
> type.
> > Basically replacing the "server.X" section of the config.
> >
> > RR DNS might not be a good option, because as Alan mentioned the order of
> > returning IPs is not guaranteed, so myid config would be cumbersome.
> >
> > Need to think about it more, but I believe it's definitely worth to
> raise a
> > Jira.
> >
> > Cool stuff.
> >
> > Regards,
> > Andor
> >
> >
> >
> >
> >
> > On Mon, Feb 11, 2019 at 5:48 PM rammohan ganapavarapu <
> > rammohanga...@gmail.com> wrote:
> >
> > > Jürgen,
> > >
> > > I have zk clusters in dynamic environment like Autoscalling groups and
> as
> > > you know in ASG it is quite common for a instance to get terminate and
> > new
> > > one comes up right, so in that case if i rely on static config it will
> be
> > > little bit hard to manage the cluster, i was thinking if we have RR DNS
> > > name atleast i can update the DNS entry when new nodes comes up or old
> > one
> > > terminate. I have not played with dynamic config option yet but if that
> > > solves the problem we see in dynamic environments i am good. And i am
> not
> > > comparing with consul but just pointing out the existing example.
> > >
> > >
> > >
> > > Alan,
> > >
> > > Yes i am looking for the similar solution.
> > >
> > > Thanks,
> > > Ram
> > >
> > > On Mon, Feb 11, 2019 at 6:52 AM Alan Scherger  >
> > > wrote:
> > >
> > > > Hey Jürgen,
> > > >
> > > > My intent was to simply suggest a more programmatic means for dynamic
> > > > configuration. In particular, the detecting of seed nodes and their
> > > > appropriate id numbers. One might imagine provisioning 3 nodes with
> > tags
> > > > like:
> > > >
> > > > zk_cluster=thebestcluster
> > > > zk_myid={1,2,3}
> > > >
> > > > and then in the zk configuration we might have:
> > > >
> > > > discovery=ec2Tags
> > > > discovery.ec2Tags.tagCluster=zk_cluster
> > > > discovery.ec2Tags.tagMyid=zk_myid
> > > >
> > > > This would allow a little code to parse the tags out of ec2 and build
> > the
> > > > seed node configurations.
> > > >
> > > > Similarly we could build and maintain a custom auth provider that
> could
> > > use
> > > > the AWS Certificate Manager Private CA APIs or Hashicorp Vault PKI
> APIs
> > > to
> > > > automatically create and fetch the appropriate certificates and
> > > > configurations.
> > > >
> > > > To your point, the security of introducing autoconfiguration of
> > settings
> > > > like these might not be appropriate for all folks or installations,
> but
> > > > environments where things like instance level IAM exist help mitigate
> > > some
> > > > risk assuming the proper access controls have been put in place.
> > >  > > > rant :) >
> > > >
> > > > I believe it's the lack of autoconfiguration in Zookeeper that has
> led
> > to
> > > > the creation of tools like Exhibitor or other tools that have never
> > been
> > > > open sourced for one reason or another. The introduction of Dynamic
> > > > Reconfiguration is quite great, but the 'Re' part might 

Re: RR DNS name instead of list of server

2019-02-10 Thread Andor Molnar
Hi Ram!

What exactly do you mean by "auto-discovery on cloud instance tags"?
Is there a standard way of doing that?

Regards,
Andor



On Sat, Feb 9, 2019 at 4:07 PM Norbert Kalmar 
wrote:

> Hi Ram,
>
> Unfortunately ZK does not support RR DNS name.
> As for plans on discovery based on cloud tags, I am not aware of any plans.
> You can create a jira for it if you'd like, but I can't tell you when that
> would make it into a release.
>
> Regards,
> Norbert
>
> On Fri, Feb 8, 2019 at 11:53 PM rammohan ganapavarapu <
> rammohanga...@gmail.com> wrote:
>
> > Hi,
> >
> > Does zookeper support RR DNS name in the config instead of giving each
> > server name/ip like what consul does to join the cluster?
> >
> >
> > server.1=server1
> > server.2=server2
> > server.3=server3
> >
> > vs
> > server=example.com
> > where example.com is RR of server1, server2 and server3
> >
> > And does any one know if zk team has any plans to add cloud autodiscovery
> > based on cloud instance tags?
> >
> > Thanks,
> > Ram
> >
>


Re: [**SPAM**] RE: ZK Server does not join quorum after restart

2019-01-25 Thread Andor Molnar
Hi Ian,

Would you please attach logs from all participants of the ensemble or try
to find an exception from when the follower is trying to join?

Regards,
Andor



On Fri, Jan 25, 2019 at 1:37 AM Ian Spence 
wrote:

> Hi Daniel,
>
> Thanks for the quick reply. We use static IP addresses on all of the
> servers so it did not change after the reboot.
>
> Thanks,
> -Ian
>
> From: Daniel Chan  on behalf of Daniel Chan <
> daniel.cw.c...@oracle.com>
> Reply-To: "user@zookeeper.apache.org" 
> Date: Thursday, January 24, 2019 at 16:36
> To: "user@zookeeper.apache.org" 
> Subject: [**SPAM**] RE: ZK Server does not join quorum after restart
>
>
> If its IP address got changed, then you hit a known bug
> https://issues.apache.org/jira/browse/ZOOKEEPER-1506  and you need to
> bounce the cluster.
>
> Thanks,
> Daniel
>
> -Original Message-
> From: Ian Spence  ian.spe...@globalrelay.net>>
> Sent: Thursday, January 24, 2019 2:36 PM
> To: user@zookeeper.apache.org
> Subject: ZK Server does not join quorum after restart
>
> Hello
>
> We have a cluster of 5 ZK servers, all running ZK 3.4.6 on Java 1.8 on
> CentOS 6. These are physical devices, not virtual machines.
>
> One server required hardware maintenance, and was restarted. When the zk
> software was restarted, it did not rejoin the quorum as a follower.
>
> Running “stat” or “mntr” commands returns: “This ZooKeeper instance is not
> currently serving requests”
>
> I googled this message and came across this bug:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D2164=DwIGaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns=S_8TazqwUbEfRtAYQCn8kA7F2tiGUBaVr3c_nj0Fh8A=FGIs9YOjwdYrzBH8om70Jx11KemHKRDsMY_kZK6cpK0=
>
> Does anybody know if there is a work-around to this issue? We’ve seen this
> problem multiple times in the past and our current solution is to bring
> down the zk cluster (which is a huge outage-causing pain).
>
> Thanks
>
> - Ian
>
>


Re: acceptedEpoch and currentEpoch values not matching exception message

2019-01-22 Thread Andor Molnar
Hi,

Looks there’re a couple of overlapping Jiras around this issue like this:
https://issues.apache.org/jira/browse/ZOOKEEPER-1621 


Mohammad Arshad has provided a patch for ZK-2307 and promised to create a 
rebased PR. Maybe you should ping him to do that or create a PR yourself if you 
feel confidence.

Either way we can fix this for the 3.5 release the earliest.

To workaround the problem I think you should manually update the epoch files to 
match and let ZooKeeper start.

Regards,
Andor




> On 2019. Jan 22., at 12:09, hrvoje  wrote:
> 
> Hi,
> 
> I have zookeeper.version=3.4.10, and I have the same problem, and I see:
> 
> https://issues.apache.org/jira/browse/ZOOKEEPER-2307
> 
> So, this issue is still not resolved, what are my options then? The problem
> appeared due to disk full, and now I cannot start zookeeper.
> 
> 
> 
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/



Re: zookeeper watcher infinite calling process() since zookkeeper is down.

2019-01-02 Thread Andor Molnar
Hi Devendra,

You need to explicitly call the close() method of ZooKeeper client object to 
stop it trying to connect and properly shut down before creating a new client.

Regards,
Andor




> On 2018. Dec 20., at 17:19, Devendra Jain  wrote:
> 
> Hi,
> 
> I am facing one issue in my project when I am creating zookepper client and
> if its not created due to zookepper server url is down. It's keep on trying
> to connect and further sending the "Watcher" object mentioning the state is
> "disconnected". I am not sure how to kill this endless callback if I want
> to pass new zookeeper URL with is up.
> Here is the code which I have written.
> 
> Class ZookeeperConnection {
> 
> final String zkConnectionStr;
> 
> public void ZookeeperConnection(String zkConnectionStr ){
> Zookeeper zkConnectionStr =zkConnectionStr;
>}
> 
>private ZooKeeper newZkConnection() {
>CountDownLatch zkConnectionLatch = new CountDownLatch(1);
>try {
> 
>ZooKeeper zkClient = new ZooKeeper(zkConnectionStr, 4, new
> Watcher() {
>int i = 0;
>@Override
>public void process(WatchedEvent watchedEvent) {
>//
>String zkSessionID = getZkSessionID();
>//
>switch (watchedEvent.getState()) {
>case ConnectedReadOnly:
>break;
>case SyncConnected:
>// Wait for connection to ZooKeeper
>zkConnectionLatch.countDown();
>break;
>case Disconnected:
>break;
>case Expired:
>restartZkServices(); //calling the same new
> connection method again
>break;
>case AuthFailed:
> 
>break;
>}
>}
>});
>//
>long zkConnectionTimeout = 30;
>zkConnectionLatch.await(zkConnectionTimeout, TimeUnit.SECONDS);
> // Wait 30 seconds
>//
>if (zkClient.getState().isConnected()) {
>return zkClient;
>}
>//
>} catch (Exception ex) {
>LOG.error("An exception happened during connecting to ZooKeeper
> [{}].", zkConnectionStr, ex);
>}
>//
>return null;
>}
> In this code I tried to create object of ZookeeperConnection class by
> passing the zookeeper server URL and latter if user want to change the
> zookepper server URL we again create a new method of ZookeeperConnection
> class so if Zookeeeper is up I am able to connect through new object. But
> old object which uses old zookepper url is still active and gets callback
> and continuously show disconnected.
> 
> Please help me on this.
> Thanks.
> Devendra Jain



Maven migration - main src dir moved

2018-10-05 Thread Andor Molnar
Hi,

Please be aware that the patch which moved ZooKeeper server’s src folder to the 
new location has been merged. You probably need to rebase your PRs and resolve 
conflicts to get them merged.

Sorry for the inconvenience.

Regards,
Andor




Re: Digest auth with classic TCP transport

2018-09-27 Thread Andor Molnar
I think they do the latter. ZooKeeper 3.4 is highly recommended to run on a
network which is isolated from the internet.
VPN could be an option, but we don't do any testing on VPN networks.

Andor



On Thu, Sep 27, 2018 at 5:14 PM, Jan Høydahl  wrote:

> Thanks.
>
> So what do people typically do to mitigate this? Other than restricting
> who has access to this network?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 27. sep. 2018 kl. 17:10 skrev Andor Molnar :
> >
> > Right. It's plaintext.
> >
> >
> > On Thu, Sep 27, 2018 at 4:54 PM, Jan Høydahl 
> wrote:
> >
> >> I am *explicitly* asking about old-style 3.4.x socket protocol, not the
> >> new Netty transport which I know supports SSL.
> >>
> >> My original question was whether authentication credentials are passed
> in
> >> plaintext across the wire and thus being easy to pickup by an attacker.
> >> And if that is true, if there are know ways of working around the lack
> of
> >> SSL support for the TCP transport.
> >>
> >> Martin Gainty, I cannot see how I can easily plug in TLS1.3 in my
> existing
> >> connection between client and Zookeeper 3.4.x, but if there is a simple
> way
> >> to do so then please share how you did it.
> >>
> >> The only solution I see, as we're stuck with 3.4.x, is to setup IPSec
> >> tunnels on OS level on all client/server traffic. I wanted to avoid
> that.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 27. sep. 2018 kl. 16:14 skrev Andor Molnar  >:
> >>>
> >>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/
> >> ZooKeeper+SSL+User+Guide
> >>>
> >>> SSL (client-server) has been added in 3.5.1
> >>> SSL server-server support is being reviewed on GitHub.
> >>>
> >>> Regards,
> >>> Andor
> >>>
> >>>
> >>>
> >>> On Thu, Sep 27, 2018 at 3:46 PM, Jan Høydahl 
> >> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>> if you're prevented from implementing SSL why not use TLSv1.3?
> >>>>
> >>>>
> >>>> I have not found any evidence that Zookeeper server nor (Java) client
> >>>> supports TLS in version 3.4.13. Please point me to some docs or
> >> tutorial.
> >>>> We don't want to fork Zookeeper to implement this stuff ourselves :)
> >>>>
> >>>> --
> >>>> Jan Høydahl, search solution architect
> >>>> Cominvent AS - www.cominvent.com
> >>>>
> >>>>> 27. sep. 2018 kl. 15:17 skrev Martin Gainty :
> >>>>>
> >>>>>
> >>>>> 
> >>>>> From: Jan Høydahl 
> >>>>> Sent: Thursday, September 27, 2018 5:12 AM
> >>>>> To: user@zookeeper.apache.org
> >>>>> Subject: Digest auth with classic TCP transport
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> We use ZK 3.4.13, and unfortunately cannot use Netty transport and
> SSL.
> >>>>> We plan to use digest authentication and Zookeeper ACL protection.
> >>>>>
> >>>>> Question is, since we cannot use SSL, is there some other way to make
> >>>> sure the user credentials are not sniffed over the network and thus
> let
> >> an
> >>>> attacker impersonate our application and cange the content in
> Zookeeper?
> >>>> Does the Zookeeper client do some smart moves to protect/hash the
> >> password
> >>>> over the network? I suppose the binary transport is easy to decipher
> for
> >>>> those who try.
> >>>>>
> >>>>> MG>if you're prevented from implementing SSL why not use TLSv1.3?
> >>>>> MG>with TLSv1.3 you can implement encryption/decryption with crypto
> >>>> private/public keys and x509 certs
> >>>>> https://en.wikipedia.org/wiki/Transport_Layer_Security
> >>>>> Transport Layer Security - Wikipedia<https://en.
> >>>> wikipedia.org/wiki/Transport_Layer_Security>
> >>>>> Transport Layer Security (TLS) – and its predecessor, Secure Sockets
> >>>> Layer (SSL), which is now deprecated by the Internet Engineering Task
> >> F

Re: Digest auth with classic TCP transport

2018-09-27 Thread Andor Molnar
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZooKeeper+SSL+User+Guide

SSL (client-server) has been added in 3.5.1
SSL server-server support is being reviewed on GitHub.

Regards,
Andor



On Thu, Sep 27, 2018 at 3:46 PM, Jan Høydahl  wrote:

> Hi,
>
> > if you're prevented from implementing SSL why not use TLSv1.3?
>
>
> I have not found any evidence that Zookeeper server nor (Java) client
> supports TLS in version 3.4.13. Please point me to some docs or tutorial.
> We don't want to fork Zookeeper to implement this stuff ourselves :)
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 27. sep. 2018 kl. 15:17 skrev Martin Gainty :
> >
> >
> > 
> > From: Jan Høydahl 
> > Sent: Thursday, September 27, 2018 5:12 AM
> > To: user@zookeeper.apache.org
> > Subject: Digest auth with classic TCP transport
> >
> > Hi
> >
> > We use ZK 3.4.13, and unfortunately cannot use Netty transport and SSL.
> > We plan to use digest authentication and Zookeeper ACL protection.
> >
> > Question is, since we cannot use SSL, is there some other way to make
> sure the user credentials are not sniffed over the network and thus let an
> attacker impersonate our application and cange the content in Zookeeper?
> Does the Zookeeper client do some smart moves to protect/hash the password
> over the network? I suppose the binary transport is easy to decipher for
> those who try.
> >
> > MG>if you're prevented from implementing SSL why not use TLSv1.3?
> > MG>with TLSv1.3 you can implement encryption/decryption with crypto
> private/public keys and x509 certs
> > https://en.wikipedia.org/wiki/Transport_Layer_Security
> > Transport Layer Security - Wikipedia wikipedia.org/wiki/Transport_Layer_Security>
> > Transport Layer Security (TLS) – and its predecessor, Secure Sockets
> Layer (SSL), which is now deprecated by the Internet Engineering Task Force
> (IETF) – are cryptographic protocols that provide communications security
> over a computer network. Several versions of the protocols find widespread
> use in applications such as web browsing, email, instant messaging, and
> voice over IP (VoIP).
> > en.wikipedia.org
> >
> >
> > MG>path of least resistance is to contact verisign and ask them to
> generate keys, certs and allow them to act as CA
> > MG>Caveat: tls1.3 implementation is slow and is supported by Mozilla
> v60...and some versions of chrome
> > MG>as far as ciphers to prevent MIMA do not implement TLS_DH_anon and
> TLS_ECDH_anon key agreement methods MG>do not authenticate the server
> > MG>you will want public key size to be min 2048bit to conform to chrome
> secure transmission requirements
> > MG>securing message is done thru MD5 or SHA but you will need to
> incorporate selected algo into
> > MG>supported cipher-suite(s)
> > https://en.wikipedia.org/wiki/Cipher_suite
> > Cipher suite - Wikipedia
> > A cipher suite is a set of algorithms that help secure a network
> connection that uses Transport Layer Security (TLS) or Secure Socket Layer
> (SSL). The set of algorithms that cipher suites usually contain include: a
> key exchange algorithm, a bulk encryption algorithm, and a message
> authentication code (MAC) algorithm.. The key exchange algorithm is used to
> exchange a key between two devices.
> > en.wikipedia.org
> >
> >
> > HTH
> > Martin
> > --
> > Jan Høydahl
> > Cominvent AS - www.cominvent.com
> >
>
>


Re: Have smaller server identifier, so dropping the connection

2018-09-14 Thread Andor Molnar
Hi Ram,

I might be missing something from your explanation, but that error message 
alone is not an issue. All ZK nodes open connection to each other, but having 2 
connections between the same nodes is redundant and one of them has to be 
closed. To decide which one to close ZK use the server identifiers: node with 
smaller id closes the initiated connection. That’s the rule and decision is 
shown in the logs.

Andor



> On 2018. Sep 12., at 3:20, rammohan ganapavarapu  
> wrote:
> 
> Is this issue got fixed in 3.4.13 ? i thought it got fixed but i am still
> seeing this when leader nodes is with lower myid and reboot a follower with
> higher myid.
> 
> Have smaller server identifier, so dropping the connection: (3, 2)
> 
> Thanks,
> Ram



Re: can not know the process name from zk log

2018-09-14 Thread Andor Molnar
Hi,

What info exactly would you like to see about the client? What do you mean by 
‘process info’?

Process name? That’s ‘java’ in 90% of cases and probably not enough to fully 
identify the process.
Process ID? 

This information currently not available in ZK, because the client doesn’t send 
it to the server.
I think as a workaround you can turn on SASL authentication, so you can see the 
id of the authenticated user which might help.

Also, I think revealing client process name and/or PID for the server will 
raise privacy/security concerns, but that’s a different question.

Regards,
Andor




> On 2018. Sep 14., at 11:55, wangyongqiang0...@163.com wrote:
> 
> some port is not always be useding by a process, may be used at some time in 
> the past
> so, from the zk log, i want to know which process accessed zk
> 
> 
> 
> wangyongqiang0...@163.com
> 
> From: Shawn Heisey
> Date: 2018-09-12 18:10
> To: user
> Subject: Re: can not know the process name from zk log
> On 9/12/2018 2:33 AM, wangyongqiang0...@163.com wrote:
>> from zk log, i can get the ip and port,  i think if zk can print the process 
>> info with the ip and port , will help us in some cases
> 
> What precisely are you after?  A java program can typically report what 
> PID its process has, but I don't know that any other process information 
> is available.  I have not checked to see whether ZK logs the PID it's 
> using at any point.  Usually such information is logged at startup (if 
> it is ever logged at all) and not anywhere else.
> 
> With the port number, you can use a program like lsof or netstat to 
> determine the pid, and I think this works on both the client and server 
> side.  Here's an example of that for another Java program.  This isn't 
> zookeeper, but the same thing will work for ZK too.
> 
> root@smeagol:~# lsof -Pn -i:45499
> COMMAND  PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> java8713 elyograg   35u  IPv6   95610  0t0  TCP 127.0.0.1:45499 
> (LISTEN)
> java8713 elyograg   62u  IPv6 6442866  0t0  TCP 
> 127.0.0.1:52686->127.0.0.1:45499 (CLOSE_WAIT)
> java8713 elyograg   67u  IPv6 6443911  0t0  TCP 
> 127.0.0.1:52792->127.0.0.1:45499 (CLOSE_WAIT)
> java8713 elyograg   78u  IPv6 6446143  0t0  TCP 
> 127.0.0.1:52814->127.0.0.1:45499 (ESTABLISHED)
> java8713 elyograg   83u  IPv6 6444628  0t0  TCP 
> 127.0.0.1:45499->127.0.0.1:52814 (ESTABLISHED)
> java8713 elyograg   84u  IPv6 6443524  0t0  TCP 
> 127.0.0.1:52710->127.0.0.1:45499 (CLOSE_WAIT)
> java8713 elyograg   85u  IPv6 6442460  0t0  TCP 
> 127.0.0.1:52360->127.0.0.1:45499 (CLOSE_WAIT)
> java8713 elyograg   87u  IPv6 6445101  0t0  TCP 
> 127.0.0.1:52766->127.0.0.1:45499 (CLOSE_WAIT)
> java8713 elyograg  113u  IPv6 6443962  0t0  TCP 
> 127.0.0.1:52844->127.0.0.1:45499 (ESTABLISHED)
> java8713 elyograg  119u  IPv6 6444645  0t0  TCP 
> 127.0.0.1:45499->127.0.0.1:52844 (ESTABLISHED)
> java8713 elyograg  200u  IPv6 6441819  0t0  TCP 
> 127.0.0.1:52656->127.0.0.1:45499 (CLOSE_WAIT)
> 
> The -Pn parameters instruct lsof to not translate port numbers or IP 
> addresses to names.  I do this to make the lsof program run faster.
> 
> Thanks,
> Shawn



Re: Leader election failing

2018-09-03 Thread Andor Molnar
Thanks for testing Chris.

So, if I understand you correctly, you're running the latest version from
branch-3.5. Could we say that this is a 3.5-only problem?
Have you ever tested the same cluster with 3.4?

Regards,
Andor



On Tue, Aug 21, 2018 at 11:29 AM, Cee Tee  wrote:

> I've tested the patch and let it run 6 days. It did not help, result is
> still the same. (remaining ZKs form islands based on datacenter they are
> in).
>
> I have mitigated it by doing a daily rolling restart.
>
> Regards,
> Chris
>
> On Mon, Aug 13, 2018 at 2:06 PM Andor Molnar 
> wrote:
>
> > Hi Chris,
> >
> > Would you mind testing the following patch on your test clusters?
> > I'm not entirely sure, but the issue might be related.
> >
> > https://issues.apache.org/jira/browse/ZOOKEEPER-2930
> >
> > Regards,
> > Andor
> >
> >
> >
> > On Wed, Aug 8, 2018 at 6:51 PM, Camille Fournier 
> > wrote:
> >
> > > If you have the time and inclination, next time you see this problem in
> > > your test clusters get stack traces and any other diagnostics possible
> > > before restarting. I'm not an expert at network debugging but if you
> have
> > > someone who is you might want them to take a look at the connections
> and
> > > settings of any switches/firewalls/etc involved, see if there's any
> > unusual
> > > configurations or evidence of other long-lived connections failing
> (even
> > if
> > > their services handle the failures more gracefully). Send us the stack
> > > traces also it would be interesting to take a look.
> > >
> > > C
> > >
> > >
> > > On Wed, Aug 8, 2018, 11:09 AM Chris  wrote:
> > >
> > > > Running 3.5.5
> > > >
> > > > I managed to recreate it on acc and test cluster today, failing on
> > > > shutdown
> > > > of leader. Both had been running for over a week. After restarting
> all
> > > > zookeepers it runs fine no matter how many leader shutdowns i throw
> at
> > > it.
> > > >
> > > > On 8 August 2018 5:05:34 pm Andor Molnar  >
> > > > wrote:
> > > >
> > > > > Some kind of a network split?
> > > > >
> > > > > It looks like 1-2 and 3-4 were able to communicate each other, but
> > > > > connection timed out between these 2 splits. When 5 came back
> online
> > it
> > > > > started with supporters of (1,2) and later 3 and 4 also joined.
> > > > >
> > > > > There was no such issue the day after.
> > > > >
> > > > > Which version of ZooKeeper is this? 3.5.something?
> > > > >
> > > > > Regards,
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Aug 8, 2018 at 4:52 PM, Chris 
> wrote:
> > > > >
> > > > >> Actually i have similar issues on my test and acceptance clusters
> > > where
> > > > >> leader election fails if the cluster has been running for a couple
> > of
> > > > days.
> > > > >> If you stop/start the Zookeepers once they will work fine on
> further
> > > > >> disruptions that day. Not sure yet what the treshold is.
> > > > >>
> > > > >>
> > > > >> On 8 August 2018 4:32:56 pm Camille Fournier 
> > > > wrote:
> > > > >>
> > > > >> Hard to say. It looks like about 15 minutes after your first
> > incident
> > > > where
> > > > >>> 5 goes down and then comes back up, servers 1 and 2 get socket
> > errors
> > > > to
> > > > >>> their connections with 3, 4, and 6. It's possible if you had
> waited
> > > > those
> > > > >>> 15 minutes, once those errors cleared the quorum would've formed
> > with
> > > > the
> > > > >>> other servers. But as for why there were those errors in the
> first
> > > > place
> > > > >>> it's not clear. Could be a network glitch, or an obscure bug in
> the
> > > > >>> connection logic. Has anyone else ever seen this?
> > > > >>> If you see it again, getting a stack trace of the servers when
> they
> > > > can't
> > > > >>> form quorum might be helpful.
> > > > >>>
> > > > >>> On Wed, Aug 8, 2018 at 11:52 AM C

  1   2   >