Re: [VOTE] Apache ZooKeeper 3.6.2 candidate 1

2020-09-04 Thread Michael Han
+1, with two minor issues:

There is one unit test deterministically failing for me locally on mac:
MultipleAddressesTest.testGetValidAddressWithNotValid Expected exception:
java.net.NoRouteToHostException

Also missing a couple of items in release notes: ZOOKEEPER-3794,
ZOOKEEPER-3797, ZOOKEEPER-3813

Neither is blocker IMO but would be nice to update the release note
considering one of the missing items is CVE related fix.

On Fri, Sep 4, 2020 at 1:04 PM Patrick Hunt  wrote:

> +1. xsum/sig validate. RAT ran clean. Was able to build and do manual
> testing with various ensemble sizes successfully. lgtm.
>
> Patrick
>
> On Fri, Sep 4, 2020 at 6:01 AM Enrico Olivelli 
> wrote:
>
> > This is a release candidate for 3.6.2.
> >
> > It is a minor release and it fixes a few critical issues and brings a few
> > dependencies upgrades.
> >
> > The full release notes is available at:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12347809
> >
> > *** Please download, test and vote by September 7th 2020, 23:59 UTC+0.
> ***
> >
> > Source files:
> > https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/
> >
> > Maven staging repo:
> >
> https://repository.apache.org/content/repositories/orgapachezookeeper-1061/
> >
> > The release candidate tag in git to be voted upon: release-3.6.2-1
> > https://github.com/apache/zookeeper/tree/release-3.6.2-1
> >
> > ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> > https://www.apache.org/dist/zookeeper/KEYS
> >
> > The staging version of the website is:
> >
> https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/website/
> >
> > Should we release this candidate?
> > Enrico Olivelli
> >
>


Re: [VOTE] Apache ZooKeeper 3.6.2 candidate 1

2020-09-04 Thread Patrick Hunt
+1. xsum/sig validate. RAT ran clean. Was able to build and do manual
testing with various ensemble sizes successfully. lgtm.

Patrick

On Fri, Sep 4, 2020 at 6:01 AM Enrico Olivelli  wrote:

> This is a release candidate for 3.6.2.
>
> It is a minor release and it fixes a few critical issues and brings a few
> dependencies upgrades.
>
> The full release notes is available at:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12347809
>
> *** Please download, test and vote by September 7th 2020, 23:59 UTC+0. ***
>
> Source files:
> https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachezookeeper-1061/
>
> The release candidate tag in git to be voted upon: release-3.6.2-1
> https://github.com/apache/zookeeper/tree/release-3.6.2-1
>
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> https://www.apache.org/dist/zookeeper/KEYS
>
> The staging version of the website is:
> https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/website/
>
> Should we release this candidate?
> Enrico Olivelli
>


[jira] [Created] (ZOOKEEPER-3927) ZooKeeper Client Fault Tolerance Extensions

2020-09-04 Thread Josh Slocum (Jira)
Josh Slocum created ZOOKEEPER-3927:
--

 Summary: ZooKeeper Client Fault Tolerance Extensions
 Key: ZOOKEEPER-3927
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3927
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Josh Slocum


Tl;dr My team at Indeed has developed ZooKeeper functionality to handle 
stateful retrying of connectionloss for write operations, and we wanted to 
reach out to discuss if this is something the ZooKeeper team may be interested 
in incorporating into the ZooKeeper client or in a separate wrapper.

 
Hi ZooKeeper Devs,

My team uses zookeeper extensively as part of a distributed key-value store 
we've built at Indeed (think HBase replacement). Due to our deployment setup 
co-locating our database daemons with our large hadoop cluster, and the 
network-intensive nature of a lot of our compute jobs, we were experiencing a 
large amount of transient ConnectionLoss issues. This was especially 
problematic on important write operations, such as the creation deletion of 
distributed locks/leases or updating distributed state in the cluster. 

We saw that some existing zookeeper client wrappers handled retrying in the 
presence of ConnectionLoss, but all of the ones we looked at 
([Curator|https://curator.apache.org/]  
[Kazoo|https://github.com/python-zk/kazoo], etc...) retried writes the same as 
reads - blindly in a loop. This meant that upon retrying a create for example, 
if the initial create had succeeded on the server but the client got 
connectionloss, we would get a NodeExists exception on the retried request, 
even though the znode was created. This resulted in many issues. For the 
distributed lock/lease example, to other nodes, it looked like the calling node 
had been successful acquiring the "lock", and to the calling node, it appeared 
that it was not able to acquire the "lock", which results in a deadlock.

To solve this, we implemented a set of "connection-loss tolerant primitives" 
for the main types of write operations. They handle a connection loss by 
retrying the operation in a loop, but upon error cases in the retry, inspect 
the current state to see if it matches the case where a previous round that got 
connectionloss actually succeeded.
* createRetriable(String path, byte[] data)
* setDataRetriable(String path, byte[] newData, int currentVersion)
* deleteRetriable(String path, int currentVersion)
* compareAndDeleteRetriable(String path, byte[] currentData, int currentVersion)

For example, in createRetriable, it will retry the create again on connection 
loss. If the retried call gets a NodeExists exception, it will check to see if 
(getData(path) == data and dataVersion == 0). If it does, it assumes the first 
create succeeded and returns success, otherwise it propagates the NodeExists 
exception.

These primitives have allowed us to program our ZooKeeper layer as if 
ConnectionLoss isn't a transient state we have to worry about, since they have 
essentially the same guarantees as the non-retriable functions in the zookeeper 
api do (with a slight difference in semantics).

Because this problem is not solved anywhere else that uses zookeeper (to my 
knowledge), we think it could be a useful contribution to the ZooKeeper project.
However, if you are not looking for contributions to extend the zookeeper api, 
and prefer client extensions to be separate, for example Curator, then we would 
consider contributing there or open sourcing our implementation as a standalone 
library.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [CANCEL] [VOTE] Apache ZooKeeper release 3.6.2 candidate 0

2020-09-04 Thread Patrick Hunt
On Fri, Sep 4, 2020 at 4:47 AM Enrico Olivelli  wrote:

> I am cancelling this VOTE.
>
> I will send RC1 soon, the problem about license files has been fixed.
>
>
The cli issue is 100% reproducible for me with 3.6/3.5 but not 3.4. I tried
a bunch of other stuff (turned off vpn, etc...) with no luck. Tried
increasing the log level but that isn't shedding light either. I'll try and
dig into it deeper if I have time but I'd recommend not blocking any
release activities given I'm the only one to report it. Thanks for the f/b
Ted/Michael/et.al. Very odd. :-)

Patrick


> Stay tuned
> Enrico
>
> Il giorno ven 4 set 2020 alle ore 02:35 Michael Han  ha
> scritto:
>
> > Haven't fully tested the RC but I didn't experience any of the lag
> through
> > the cli.
> >
> > On Thu, Sep 3, 2020 at 3:15 PM Ted Dunning 
> wrote:
> >
> > > On Thu, Sep 3, 2020 at 1:58 PM Patrick Hunt  wrote:
> > >
> > > > On Thu, Sep 3, 2020 at 1:54 PM Ted Dunning 
> > > wrote:
> > > >
> > > > > OK. Did it with the correct version this time. I saw no typing
> delays
> > > in
> > > > > zkCli.sh.
> > > > ...
> > > > Hm, no idea - I tried the regular mac terminal (I use iterm2) and
> also
> > > > tried launching from sh vs bash but no changes. Very odd.
> > >
> > >
> > > I use the normal terminal on my mac so our environments are very
> similar.
> > >
> >
>


Re: [VOTE] Apache ZooKeeper 3.6.2 candidate 1

2020-09-04 Thread Ted Dunning
I did the following (minimal) checks:

* downloaded binary distro
* started single node ZK cluster
* verified simple operations in zkCli
* verified no lag in typing on OSX using JDK 12
* verified checksum and signature

* downloaded source distro
* verified that compile and packaging succeeded
* repeated minimal functional test on the resulting binary
* verified checksum and signature



On Fri, Sep 4, 2020 at 6:01 AM Enrico Olivelli  wrote:

> This is a release candidate for 3.6.2.
>
> It is a minor release and it fixes a few critical issues and brings a few
> dependencies upgrades.
>
> The full release notes is available at:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12347809
>
> *** Please download, test and vote by September 7th 2020, 23:59 UTC+0. ***
>
> Source files:
> https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/
>
> Maven staging repo:
> https://repository.apache.org/content/repositories/orgapachezookeeper-1061/
>
> The release candidate tag in git to be voted upon: release-3.6.2-1
> https://github.com/apache/zookeeper/tree/release-3.6.2-1
>
> ZooKeeper's KEYS file containing PGP keys we use to sign the release:
> https://www.apache.org/dist/zookeeper/KEYS
>
> The staging version of the website is:
> https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/website/
>
> Should we release this candidate?
> Enrico Olivelli
>


[VOTE] Apache ZooKeeper 3.6.2 candidate 1

2020-09-04 Thread Enrico Olivelli
This is a release candidate for 3.6.2.

It is a minor release and it fixes a few critical issues and brings a few
dependencies upgrades.

The full release notes is available at:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310801&version=12347809

*** Please download, test and vote by September 7th 2020, 23:59 UTC+0. ***

Source files:
https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/

Maven staging repo:
https://repository.apache.org/content/repositories/orgapachezookeeper-1061/

The release candidate tag in git to be voted upon: release-3.6.2-1
https://github.com/apache/zookeeper/tree/release-3.6.2-1

ZooKeeper's KEYS file containing PGP keys we use to sign the release:
https://www.apache.org/dist/zookeeper/KEYS

The staging version of the website is:
https://people.apache.org/~eolivelli/zookeeper-3.6.2-candidate-1/website/

Should we release this candidate?
Enrico Olivelli


Re: Test flakiness (was: [VOTE] Apache ZooKeeper release 3.6.2 candidate 0)

2020-09-04 Thread Enrico Olivelli
Damien
you can use -Dsurefire-forkcount=1 in order to run only one test at a time.
This should reduce flakiness

Enrico

Il giorno gio 3 set 2020 alle ore 21:37 Damien Diederen <
ddiede...@sinenomine.net> ha scritto:

>
> Hi Enrico, all,
>
> TL;DR: Builds of 3.6.2 pass with 8 (!) cores on Ubuntu 18.04, 20.04, and
> NixOS 20.03.
>
>
> I wrote:
>
> >> It took me a number of tries, because that is a VM and the tests are
> >> somewhat flaky in that environment.
>
> You suggested:
>
> > On 3.6.2 I don't see flaky tests in my local environment,
> > can you please start another email thread with your problems ?
> > They will deserve JIRA issues and investigations
>
> Okay; I did a bit more experimentation.
>
> As I mentioned, the builds which exhibited flakiness were run in a VM.
> That was not a "weak" VM, however: KVM, 4 cores, 16 GiB, backed by an
> Intel i7 CPU.
>
> I notably tried with an Ubuntu live DVD and builds on /tmp (tmpfs).
> Using "only" 4 cores results in frequent failures, but allocating 8
> cores to the VM seemingly allows builds to SUCCEED.
>
> I don't know if 8 cores is considered a reasonable requirement for the
> test suite.
>
>
> I also tried on a (real) NixOS laptop with 8 logical processors and 16
> GiB of RAM; that build passed.  Which means that so far, I have seen
> builds succeed on:
>
>   * Ubuntu 20.04;
>   * Ubuntu 18.04;
>   * NixOS 20.03.
>
>
> In that previous email, I also wrote:
>
> >> this is, as far as I know, a long-standing issue and completely
> >> unrelated to 3.6.2.
>
> I have experienced frequent failures when building 3.6.1 with tests on a
> 6-core Xeon with 64 GiB RAM.  That machine plays the role of a shared
> "build server," so other users can cause varying utilization.  But
> still, it is no slouch, and utilization was relatively low when I
> observed these issues—so I was a bit surprised to see such failures.
>
>
> I guess I should try building master on these various configurations (I
> haven't seen that kind of trouble there).
>
>
> Best,
> Damien
>
>
> P.-S. — For the record, here is the procedure I used (starting from the
> bare live Ubuntu DVD):
>
> sudo add-apt-repository universe
>
> sudo apt install \
> autoconf \
> build-essential \
> git \
> libcppunit-dev \
> libsasl2-dev \
> libtool-bin \
> maven \
> openjdk-11-jdk-headless \
> pkg-config
>
> rsync -av --delete /mnt/zookeeper/ /tmp/zookeeper/
> cd /tmp/zookeeper
>
> export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
>
> LC_ALL=C mvn install -Pfull-build 2>&1 |
> tee /mnt/ubuntu-mvn-tests.log
>
>
> And here is an example of failure—clearly a time out:
>
> [ERROR]
> testJvmPauseMonitorExceedWarnThreshold(org.apache.zookeeper.server.util.JvmPauseMonitorTest)
> Time elapsed: 5.119 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after
> 5000 milliseconds
> at
> app//org.apache.zookeeper.server.util.JvmPauseMonitorTest.testJvmPauseMonitorExceedWarnThreshold(JvmPauseMonitorTest.java:54)
>
>
> Here is another example, somewhat more murky (possibly related to some
> kind of race condition):
>
> [ERROR]
> testWatchAutoResetWithPending(org.apache.zookeeper.test.WatcherTest)  Time
> elapsed: 0.083 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected bean exists! expected:<0> but
> was:<1>
> at org.apache.zookeeper.test.WatcherTest.setUp(WatcherTest.java:84)
>
>
> I can provide more logs if we decide to open a ticket to track this topic.
>
>
>
>
> >> Il giorno mer 2 set 2020 alle ore 20:42 Damien Diederen <
> >> ddiede...@sinenomine.net> ha scritto:
> >>
> >>>
> >>> Hi Enrico, all,
> >>>
> >>> I was also able to build and successfully run the tests of this release
> >>> candidate on Ubuntu 20.04.1, with the provided Java & Maven:
> >>>
> >>> $ grep VERSION= /etc/os-release
> >>> VERSION="20.04.1 LTS (Focal Fossa)"
> >>> $ java -version
> >>> openjdk version "11.0.8" 2020-07-14
> >>> $ mvn -version
> >>> Apache Maven 3.6.3
> >>>
> >>> (It took me a number of tries, because that is a VM and the tests are
> >>> somewhat flaky in that environment.  But this is, as far as I know, a
> >>> long-standing issue and completely unrelated to 3.6.2.  Please let me
> >>> know if you have tips/tricks for avoiding such temporary failures.)
> >>>
> >>> Cheers, -D
> >>>
> >>>
> >>>
> >>> Szalay-Bekő Máté  writes:
> >>> > +1 (non-binding)
> >>> >
> >>> > - I built the source code (-Pfull-build) on Ubuntu 18.04.3 using
> OpenJDK
> >>> > 8u242, OpenJDK 11.0.8 and maven 3.6.0.
> >>> > - all the unit tests passed (both Java and C-client).
> >>> > - I also built and executed unit tests for zkpython
> >>> > - checkstyle and spotbugs passed
> >>> > - apache-rat passed
> >>> > - owasp (CVE check) passed
> >>> > - fatjar built (-Pfatjar)
> >>> >
> >>> > On Tue, Sep 1, 2020 at 11:35 AM Enrico Olivelli  >
> >>> wrote:
> >>> >
> >>> >> This is a release ca

[CANCEL] [VOTE] Apache ZooKeeper release 3.6.2 candidate 0

2020-09-04 Thread Enrico Olivelli
I am cancelling this VOTE.

I will send RC1 soon, the problem about license files has been fixed.

Stay tuned
Enrico

Il giorno ven 4 set 2020 alle ore 02:35 Michael Han  ha
scritto:

> Haven't fully tested the RC but I didn't experience any of the lag through
> the cli.
>
> On Thu, Sep 3, 2020 at 3:15 PM Ted Dunning  wrote:
>
> > On Thu, Sep 3, 2020 at 1:58 PM Patrick Hunt  wrote:
> >
> > > On Thu, Sep 3, 2020 at 1:54 PM Ted Dunning 
> > wrote:
> > >
> > > > OK. Did it with the correct version this time. I saw no typing delays
> > in
> > > > zkCli.sh.
> > > ...
> > > Hm, no idea - I tried the regular mac terminal (I use iterm2) and also
> > > tried launching from sh vs bash but no changes. Very odd.
> >
> >
> > I use the normal terminal on my mac so our environments are very similar.
> >
>


Re: Containerizing ZooKeeper with Twine: Powering container orchestration from within

2020-09-04 Thread Justin Ling Mao
Great blog post!!!
- Original Message -
From: Mohamed Jeelani 
To: "dev@zookeeper.apache.org" 
Cc: Christopher Bunn 
Subject: Containerizing ZooKeeper with Twine: Powering container orchestration 
from within
Date: 2020-09-01 03:43

Hi ZooKeepers!
Hello from your friends here at Facebook. Chris Bunn from our team just 
published a blog post this morning regarding containerizing ZooKeeper and how 
we run that containerization platform as well on top of ZooKeeper. You can read 
the blog post at https://engineering.fb.com/developer-tools/zookeeper-twine/
Chris will also be doing a talk this Wednesday at 11am PST followed by a live 
Q&A as part of our Systems @Scale virtual 
event. You can 
register for the event at https://systemsscalempk2020.splashthat.com/.
Send in your questions for Wednesday’s live Q&A to 
systemsatsc...@fb.com


[jira] [Created] (ZOOKEEPER-3926) make the rc constant in the ClientCnxn

2020-09-04 Thread maoling (Jira)
maoling created ZOOKEEPER-3926:
--

 Summary: make the rc constant in the ClientCnxn
 Key: ZOOKEEPER-3926
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3926
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: maoling


Lots of codes about the result code in the 
callback(ClientCnxn.EventThread#processEvent) is hardcode. For example:
{code:java}
} else if (p.response instanceof GetACLResponse) {
ACLCallback cb = (ACLCallback) p.cb;
GetACLResponse rsp = (GetACLResponse) p.response;
if (rc == 0) {
cb.processResult(rc, clientPath, p.ctx, rsp.getAcl(), rsp.getStat());
} else {
cb.processResult(rc, clientPath, p.ctx, null, null);
}
}{code}
This makes the codes difficult to maintain. What we want looks like this:
{code:java}
if (rc == Code.OK.intValue()) {
   
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)