Re: Leader election

2018-12-10 Thread Ted Dunning
One very useful way to deal with this is the method used in MapR FS. The
idea is that ZK should only be used rarely and short periods of two leaders
must be tolerated, but other data has to be written with absolute
consistency.

The method that we chose was to associate an epoch number with every write,
require all writes to to all replicas and require that all replicas only
acknowledge writes with their idea of the current epoch for an object.

What happens in the even of partition is that we have a few possible cases,
but in any case where data replicas are split by a partition, writes will
fail triggering a new leader election. Only replicas on the side of the new
ZK quorum (which may be the old quorum) have a chance of succeeding here.
If the replicas are split away from the ZK quorum, writes may not be
possible until the partition heals. If a new leader is elected, it will
increment the epoch and form a replication chain out of the replicas it can
find telling them about the new epoch. Writes can then proceed. During
partition healing, any pending writes from the old epoch will be ignored by
the current replicas. None of the writes to the new epoch will be directed
to the old replicas after partition healing, but such writes should be
ignored as well.

In a side process, replicas that have come back after a partition may be
updated with writes from the new replicas. If the partition lasts long
enough, a new replica should be formed from the members of the current
epoch. If a new replica is formed and an old one is resurrected, then the
old one should probably be deprecated, although data balancing
considerations may come into play.

In the actual implementation of MapR FS, there is a lot of sophistication
that does into the details, of course, and there is actually one more level
of delegation that happens, but this outline is good enough for a lot of
systems.

The virtues of this system are multiple:

1) partition is detected exactly as soon as it affects a write. Detecting a
partition sooner than that doesn't serve a lot of purpose, especially since
the time to recover from a failed write is comparable to the duration of a
fair number of partitions.

2) having an old master continue under false pretenses does no harm since
it cannot write to a more recent replica chain. This is more important than
it might seem since there can be situations where clocks don't necessarily
advance at the expected rate so what seems like a short time can actually
be much longer (Rip van Winkle failures).

3) forcing writes to all live replicas while allowing reorganization is
actually very fast and as long as we can retain one live replica we can
continue writing. This is in contrast to quorum systems where dropping
below the quorum stops writes. This is important because the replicas of
different objects can be arranged so that the portion of the cluster with a
ZK quorum might not have a majority of replicas for some objects.

4) electing a new master of a replica chain can be done quite quickly so
the duration of any degradation can be quite short (because you can set
write timeouts fairly short because an unnecessary election takes less time
than a long timeout)

Anyway, you probably already have a design in mind. If this helps anyway,
that's great.

On Mon, Dec 10, 2018 at 10:32 PM Michael Borokhovich 
wrote:

> Makes sense. Thanks, Ted. We will design our system to cope with the short
> periods where we might have two leaders.
>
> On Thu, Dec 6, 2018 at 11:03 PM Ted Dunning  wrote:
>
> > ZK is able to guarantee that there is only one leader for the purposes of
> > updating ZK data. That is because all commits have to originate with the
> > current quorum leader and then be acknowledged by a quorum of the current
> > cluster. IF the leader can't get enough acks, then it has de facto lost
> > leadership.
> >
> > The problem comes when there is another system that depends on ZK data.
> > Such data might record which node is the leader for some other purposes.
> > That leader will only assume that they have become leader if they succeed
> > in writing to ZK, but if there is a partition, then the old leader may
> not
> > be notified that another leader has established themselves until some
> time
> > after it has happened. Of course, if the erstwhile leader tried to
> validate
> > its position with a write to ZK, that write would fail, but you can't
> spend
> > 100% of your time doing that.
> >
> > it all comes down to the fact that a ZK client determines that it is
> > connected to a ZK cluster member by pinging and that cluster member sees
> > heartbeats from the leader. The fact is, though, that you can't tune
> these
> > pings to be faster than some level because you start to see lots of false
> > positives for loss of connection. Remember that it isn't just loss of
> > connection here that is the point. Any kind of delay would have the same
> > effect. Getting these ping intervals below one second makes for a 

Re: Leader election

2018-12-10 Thread Michael Borokhovich
Thanks, Maciej. That sounds good. We will try playing with the parameters
and have at least a known upper limit on the inconsistency interval.

On Fri, Dec 7, 2018 at 2:11 AM Maciej Smoleński  wrote:

> On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich 
> wrote:
>
> > We are planning to run Zookeeper nodes embedded with the client nodes.
> > I.e., each client runs also a ZK node. So, network partition will
> > disconnect a ZK node and not only the client.
> > My concern is about the following statement from the ZK documentation:
> >
> > "Timeliness: The clients view of the system is guaranteed to be
> up-to-date
> > within a certain time bound. (*On the order of tens of seconds.*) Either
> > system changes will be seen by a client within this bound, or the client
> > will detect a service outage."
> >
>
> This is related to the fact that ZooKeeper server handles reads from its
> local state - without communicating with other ZooKeeper servers.
> This design ensures scalability for read dominated workloads.
> In this approach client might receive data which is not up to date (it
> might not contain updates from other ZooKeeper servers (quorum)).
> Parameter 'syncLimit' describes how often ZooKeeper server
> synchronizes/updates its local state to global state.
> Client read operation will retrieve data from state not older then
> described by 'syncLimit'.
>
> However ZooKeeper client can always force to retrieve data which is up to
> date.
> It needs to issue 'sync' command to ZooKeeper server before issueing
> 'read'.
> With 'sync' ZooKeeper server with synchronize its local state with global
> state.
> Later 'read' will be handled from updated state.
> Client should be careful here - so that it communicates with the same
> ZooKeeper server for both 'sync' and 'read'.
>
>
> > What are these "*tens of seconds*"? Can we reduce this time by
> configuring
> > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> > guarantee on this time bound?
> >
>
> As describe above - you might use 'sync'+'read' to avoid this problem.
>
>
> >
> >
> > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> > jor...@jordanzimmerman.com>
> > wrote:
> >
> > > > Old service leader will detect network partition max 15 seconds after
> > it
> > > > happened.
> > >
> > > If the old service leader is in a very long GC it will not detect the
> > > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > > leaders for a short period of time.
> > >
> > > -JZ
> >
>


Re: Leader election

2018-12-10 Thread Michael Borokhovich
Yes, I agree, our system should be able to tolerate two leaders for a short
and bounded period of time.
Thank you for the help!

On Thu, Dec 6, 2018 at 11:09 AM Jordan Zimmerman 
wrote:

> > it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself
>
> Yes - there are many ways in which you can end up with 2 leaders. However,
> if properly tuned and configured, it will be for a few seconds at most.
> During a GC pause no work is being done anyway. But, this stuff is very
> tricky. Requiring an atomically unique leader is actually a design smell
> and you should reconsider your architecture.
>
> > Maybe we can use a more
> > lightweight Hazelcast for example?
>
> There is no distributed system that can guarantee a single leader. Instead
> you need to adjust your design and algorithms to deal with this (using
> optimistic locking, etc.).
>
> -Jordan
>
> > On Dec 6, 2018, at 1:52 PM, Michael Borokhovich 
> wrote:
> >
> > Thanks Jordan,
> >
> > Yes, I will try Curator.
> > Also, beyond the problem described in the Tech Note, it seems like the
> > inconsistency may be caused by the partition of the Zookeeper cluster
> > itself. E.g., if a "leader" client is connected to the partitioned ZK
> node,
> > it may be notified not at the same time as the other clients connected to
> > the other ZK nodes. So, another client may take leadership while the
> > current leader still unaware of the change. Is it true?
> >
> > Another follow up question. If Zookeeper can guarantee a single leader,
> is
> > it worth using it just for leader election? Maybe we can use a more
> > lightweight Hazelcast for example?
> >
> > Michael.
> >
> >
> > On Thu, Dec 6, 2018 at 4:50 AM Jordan Zimmerman <
> jor...@jordanzimmerman.com>
> > wrote:
> >
> >> It is not possible to achieve the level of consistency you're after in
> an
> >> eventually consistent system such as ZooKeeper. There will always be an
> >> edge case where two ZooKeeper clients will believe they are leaders
> (though
> >> for a short period of time). In terms of how it affects Apache Curator,
> we
> >> have this Tech Note on the subject:
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 <
> >> https://cwiki.apache.org/confluence/display/CURATOR/TN10> (the
> >> description is true for any ZooKeeper client, not just Curator
> clients). If
> >> you do still intend to use a ZooKeeper lock/leader I suggest you try
> Apache
> >> Curator as writing these "recipes" is not trivial and have many gotchas
> >> that aren't obvious.
> >>
> >> -Jordan
> >>
> >> http://curator.apache.org 
> >>
> >>
> >>> On Dec 5, 2018, at 6:20 PM, Michael Borokhovich 
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> We have a service that runs on 3 hosts for high availability. However,
> at
> >>> any given time, exactly one instance must be active. So, we are
> thinking
> >> to
> >>> use Leader election using Zookeeper.
> >>> To this goal, on each service host we also start a ZK server, so we
> have
> >> a
> >>> 3-nodes ZK cluster and each service instance is a client to its
> dedicated
> >>> ZK server.
> >>> Then, we implement a leader election on top of Zookeeper using a basic
> >>> recipe:
> >>> https://zookeeper.apache.org/doc/r3.1.2/recipes.html#sc_leaderElection
> .
> >>>
> >>> I have the following questions doubts regarding the approach:
> >>>
> >>> 1. It seems like we can run into inconsistency issues when network
> >>> partition occurs. Zookeeper documentation says that the inconsistency
> >>> period may last “tens of seconds”. Am I understanding correctly that
> >> during
> >>> this time we may have 0 or 2 leaders?
> >>> 2. Is it possible to reduce this inconsistency time (let's say to 3
> >>> seconds) by tweaking tickTime and syncLimit parameters?
> >>> 3. Is there a way to guarantee exactly one leader all the time? Should
> we
> >>> implement a more complex leader election algorithm than the one
> suggested
> >>> in the recipe (using ephemeral_sequential nodes)?
> >>>
> >>> Thanks,
> >>> Michael.
> >>
> >>
>
>


Re: Leader election

2018-12-10 Thread Michael Borokhovich
Makes sense. Thanks, Ted. We will design our system to cope with the short
periods where we might have two leaders.

On Thu, Dec 6, 2018 at 11:03 PM Ted Dunning  wrote:

> ZK is able to guarantee that there is only one leader for the purposes of
> updating ZK data. That is because all commits have to originate with the
> current quorum leader and then be acknowledged by a quorum of the current
> cluster. IF the leader can't get enough acks, then it has de facto lost
> leadership.
>
> The problem comes when there is another system that depends on ZK data.
> Such data might record which node is the leader for some other purposes.
> That leader will only assume that they have become leader if they succeed
> in writing to ZK, but if there is a partition, then the old leader may not
> be notified that another leader has established themselves until some time
> after it has happened. Of course, if the erstwhile leader tried to validate
> its position with a write to ZK, that write would fail, but you can't spend
> 100% of your time doing that.
>
> it all comes down to the fact that a ZK client determines that it is
> connected to a ZK cluster member by pinging and that cluster member sees
> heartbeats from the leader. The fact is, though, that you can't tune these
> pings to be faster than some level because you start to see lots of false
> positives for loss of connection. Remember that it isn't just loss of
> connection here that is the point. Any kind of delay would have the same
> effect. Getting these ping intervals below one second makes for a very
> twitchy system.
>
>
>
> On Fri, Dec 7, 2018 at 11:03 AM Michael Borokhovich 
> wrote:
>
> > We are planning to run Zookeeper nodes embedded with the client nodes.
> > I.e., each client runs also a ZK node. So, network partition will
> > disconnect a ZK node and not only the client.
> > My concern is about the following statement from the ZK documentation:
> >
> > "Timeliness: The clients view of the system is guaranteed to be
> up-to-date
> > within a certain time bound. (*On the order of tens of seconds.*) Either
> > system changes will be seen by a client within this bound, or the client
> > will detect a service outage."
> >
> > What are these "*tens of seconds*"? Can we reduce this time by
> configuring
> > "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> > guarantee on this time bound?
> >
> >
> > On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> > jor...@jordanzimmerman.com>
> > wrote:
> >
> > > > Old service leader will detect network partition max 15 seconds after
> > it
> > > > happened.
> > >
> > > If the old service leader is in a very long GC it will not detect the
> > > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > > leaders for a short period of time.
> > >
> > > -JZ
> >
>


[jira] [Created] (ZOOKEEPER-3211) zookeeper单机版本部署,在centos7.0内核中偶现一个较严重问题,Server段默认的60个连接全部变为CLOSE_WATI状态且长时间不消除,导致zk无法正常提供服务

2018-12-10 Thread yeshuangshuang (JIRA)
yeshuangshuang created ZOOKEEPER-3211:
-

 Summary: 
zookeeper单机版本部署,在centos7.0内核中偶现一个较严重问题,Server段默认的60个连接全部变为CLOSE_WATI状态且长时间不消除,导致zk无法正常提供服务
 Key: ZOOKEEPER-3211
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3211
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.5
 Environment: 1.部署配置
server.1=127.0.0.1:2902:2903
2.部署版本
内核:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 19:44:50 
EST 2019 x86_64 x86_64 x86_64 GNU/Linux
JDK:
java version "1.7.0_181"
OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
zk: 3.4.5
Reporter: yeshuangshuang
 Fix For: 3.4.5
 Attachments: 1.log, zklog.rar

1.部署配置
server.1=127.0.0.1:2902:2903
2.部署版本
内核:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 19:44:50 
EST 2019 x86_64 x86_64 x86_64 GNU/Linux
JDK:
java version "1.7.0_181"
OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
zk: 3.4.5
3.问题现象:不是必现问题,但是复现概率极高,起初是读写超时,大概耗时6s左右,过来几分钟后所有的连接(包括长连接)都成了CLOSE_WAIT状态。
4.目前手段:发现连接全部变为close_wait 主动重启zookeeper 服务端



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Cannot find doap file: http://zookeeper.apache.org/doap.rdf

2018-12-10 Thread Patrick Hunt
That's right, this is just before the recent change:
https://github.com/apache/zookeeper/tree/90210c653396d840611e41f09b342e0a702c025a/content
we need to re-introduce the file at that same point.
https://github.com/apache/zookeeper/blob/90210c653396d840611e41f09b342e0a702c025a/content/doap.rdf
probably a good time to review the contents and Apache guidelines to ensure
they are in sync.

Regards,

Patrick

On Mon, Dec 10, 2018 at 1:38 PM Andor Molnár  wrote:

> What's the original location where it needs to be restored?
> asf-site/content/..?
>
> Tamaas, doap.rdf is still at the root of 'website' branch. I think we
> have to adjust pom.xml to copy it to the right place.
>
>
> Regards,
>
> Andor
>
>
>
>
> On 12/10/18 18:48, Patrick Hunt wrote:
> > Andor I believe you modified the web site recently, could you take a
> look?
> >
> > Thanks for the report Sebb.
> >
> > Patrick
> >
> > On Sun, Dec 9, 2018 at 3:13 AM sebb  wrote:
> >
> >> Please can you fix this error?
> >>
> >> The DOAP has moved.
> >>
> >> Either change the location in the projects.xml file, or restore the
> >> DOAP to its former location.
> >>
> >> Thanks!
> >> -- Forwarded message -
> >> From: Projects 
> >> Date: Sun, 9 Dec 2018 at 02:01
> >> Subject: Cannot find doap file: http://zookeeper.apache.org/doap.rdf
> >> To: Site Development 
> >>
> >>
> >> URL: http://zookeeper.apache.org/doap.rdf
> >> HTTP Error 404: Not Found
> >> Source:
> >>
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/projects.xml
> >>
>


ZooKeeper-trunk - Build # 304 - Failure

2018-12-10 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/304/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 178.28 KB...]
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.734 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Tests run: 106, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
338.787 sec, Thread: 2, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.66 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.11 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
16.294 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.777 sec, Thread: 4, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.18 sec, Thread: 4, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 4
[junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.839 sec, Thread: 4, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.095 sec, Thread: 4, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.786 sec, Thread: 4, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.133 sec, Thread: 4, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 4
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.996 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.2 
sec, Thread: 2, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.381 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.766 sec, Thread: 4, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.088 sec, Thread: 4, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 4
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.92 sec, Thread: 4, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.85 sec, Thread: 4, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Running org.apache.zookeeper.util.PemReaderTest in thread 4
[junit] Tests run: 64, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.543 sec, Thread: 4, Class: org.apache.zookeeper.util.PemReaderTest
[junit] Running org.apache.jute.BinaryInputArchiveTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.076 sec, Thread: 4, Class: org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.817 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 106, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
355.473 sec, Thread: 

[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-10 Thread ivmaykov
Github user ivmaykov closed the pull request at:

https://github.com/apache/zookeeper/pull/680


---


[GitHub] zookeeper pull request #680: ZOOKEEPER-3174: Quorum TLS - support reloading ...

2018-12-10 Thread ivmaykov
GitHub user ivmaykov reopened a pull request:

https://github.com/apache/zookeeper/pull/680

ZOOKEEPER-3174: Quorum TLS - support reloading trust/key store

Allow reloading SSL trust stores and key stores from disk when the files on 
disk change.

## Added support for reloading key/trust stores when the file on disk 
changes
- new property `sslQuorumReloadCertFiles` which controls the behavior for 
reloading the key and trust store files for `QuorumX509Util`. Reloading of key 
and trust store for `ClientX509Util` is not in this PR but could be added easily
- this allows a ZK server to keep running on a machine that uses 
short-lived certs that refresh frequently without having to restart the ZK 
process.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ivmaykov/zookeeper ZOOKEEPER-3174

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #680


commit cc72c083c0b70409d78da11507ca5e80e726bb69
Author: Ilya Maykov 
Date:   2018-10-25T01:54:06Z

ZOOKEEPER-3174: Quorum TLS - support reloading trust/key store




---


Re: Cannot find doap file: http://zookeeper.apache.org/doap.rdf

2018-12-10 Thread Andor Molnár
What's the original location where it needs to be restored?
asf-site/content/..?

Tamaas, doap.rdf is still at the root of 'website' branch. I think we
have to adjust pom.xml to copy it to the right place.


Regards,

Andor




On 12/10/18 18:48, Patrick Hunt wrote:
> Andor I believe you modified the web site recently, could you take a look?
>
> Thanks for the report Sebb.
>
> Patrick
>
> On Sun, Dec 9, 2018 at 3:13 AM sebb  wrote:
>
>> Please can you fix this error?
>>
>> The DOAP has moved.
>>
>> Either change the location in the projects.xml file, or restore the
>> DOAP to its former location.
>>
>> Thanks!
>> -- Forwarded message -
>> From: Projects 
>> Date: Sun, 9 Dec 2018 at 02:01
>> Subject: Cannot find doap file: http://zookeeper.apache.org/doap.rdf
>> To: Site Development 
>>
>>
>> URL: http://zookeeper.apache.org/doap.rdf
>> HTTP Error 404: Not Found
>> Source:
>> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/projects.xml
>>


[jira] [Commented] (ZOOKEEPER-1636) c-client crash when zoo_amulti failed

2018-12-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715583#comment-16715583
 ] 

Hudson commented on ZOOKEEPER-1636:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #303 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/303/])
ZOOKEEPER-1636: cleanup completion list of a failed multi request (andor: rev 
b1fd480b2c8e0cc1429345ee04510d3849001c5c)
* (edit) zookeeper-client/zookeeper-client-c/src/zookeeper.c
* (edit) zookeeper-client/zookeeper-client-c/tests/TestMulti.cc


> c-client crash when zoo_amulti failed 
> --
>
> Key: ZOOKEEPER-1636
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1636
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Michael K. Edwards
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, 
> ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> deserialize_response for multi operation don't handle the case where the 
> server fail to send back response. (Eg. when multi packet is too large) 
> c-client will try to process completion of all sub-request as if the 
> operation is successful and will eventually cause SIGSEGV



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1636) c-client crash when zoo_amulti failed

2018-12-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715496#comment-16715496
 ] 

Hudson commented on ZOOKEEPER-1636:
---

SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #142 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/142/])
ZOOKEEPER-1636: cleanup completion list of a failed multi request (andor: rev 
b1fd480b2c8e0cc1429345ee04510d3849001c5c)
* (edit) zookeeper-client/zookeeper-client-c/tests/TestMulti.cc
* (edit) zookeeper-client/zookeeper-client-c/src/zookeeper.c


> c-client crash when zoo_amulti failed 
> --
>
> Key: ZOOKEEPER-1636
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1636
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.3
>Reporter: Thawan Kooburat
>Assignee: Michael K. Edwards
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
> Attachments: ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, 
> ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch, ZOOKEEPER-1636.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> deserialize_response for multi operation don't handle the case where the 
> server fail to send back response. (Eg. when multi packet is too large) 
> c-client will try to process completion of all sub-request as if the 
> operation is successful and will eventually cause SIGSEGV



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Cannot find doap file: http://zookeeper.apache.org/doap.rdf

2018-12-10 Thread Patrick Hunt
Andor I believe you modified the web site recently, could you take a look?

Thanks for the report Sebb.

Patrick

On Sun, Dec 9, 2018 at 3:13 AM sebb  wrote:

> Please can you fix this error?
>
> The DOAP has moved.
>
> Either change the location in the projects.xml file, or restore the
> DOAP to its former location.
>
> Thanks!
> -- Forwarded message -
> From: Projects 
> Date: Sun, 9 Dec 2018 at 02:01
> Subject: Cannot find doap file: http://zookeeper.apache.org/doap.rdf
> To: Site Development 
>
>
> URL: http://zookeeper.apache.org/doap.rdf
> HTTP Error 404: Not Found
> Source:
> https://svn.apache.org/repos/asf/comdev/projects.apache.org/trunk/data/projects.xml
>


[GitHub] zookeeper issue #680: ZOOKEEPER-3174: Quorum TLS - support reloading trust/k...

2018-12-10 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/680
  
@ivmaykov For the future, I think it's more convenient for reviewers if you 
submit your commits separately instead of squashing them. Especially if you 
provide some feedback on code review, it's hard to locate new changes if 
everything is in a single patch. Commit script will squash all them eventually, 
so it doesn't really matter in PRs.


---


[GitHub] zookeeper issue #418: [zookeeper-2937] disallow client requests without comp...

2018-12-10 Thread sriramch
Github user sriramch commented on the issue:

https://github.com/apache/zookeeper/pull/418
  
@emonty - yes, i would love to have this merged back. since there was no 
traction on this, we have worked around this issue for now. however, we would 
love to have this merged back. 


---


[GitHub] zookeeper issue #418: [zookeeper-2937] disallow client requests without comp...

2018-12-10 Thread emonty
Github user emonty commented on the issue:

https://github.com/apache/zookeeper/pull/418
  
@afine @sriramch Just came across this while reading the associated Jira 
issue. It's in merge conflict and seems kind of abandoned for a year, but the 
issue itself seems important. Is it worth picking this back up? Or are there 
other thoughts on the topic?


---


[GitHub] zookeeper issue #717: ZOOKEEPER-1636: cleanup completion list of a failed mu...

2018-12-10 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/717
  
Committed to master and 3.5 branches. Thanks @mkedwards !


---


[GitHub] zookeeper pull request #717: ZOOKEEPER-1636: cleanup completion list of a fa...

2018-12-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/717


---


[GitHub] zookeeper issue #684: ZOOKEEPER-3180: Add response cache to improve the thro...

2018-12-10 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/684
  
@enixon 
need another rebase .can we move on?


---