RE: Can the leader of a Zookeeper be specifically selected at startup?

2022-06-20 Thread Kathryn Hogg
Jumping in to this conversation:

I know that's best practice.  In my case though, we have a situation where we  
spread our zk nodes across two networks in different buildings connected with 
high speed robust, redundant networks.  We also have some servers at a third 
co-location facility is isn't exactly quite as robust.  We generally use this 
for quorum management, just to break ties when a netsplit happens.  For 
example, for mongodb cluster we run a non-data bearing node in arbiter mode 
here.

For Zookeeper, we exclude this node from the connection string as we never want 
our clients connecting to it.  In an ideal world, we would also like it to 
participate in leader elections but not be electable.  Bonus points if it only 
participated in leader elections and didn't have a copy of the znode data.


--
Kathryn Hogg
Principal Technology Architect

-Original Message-
From: Szalay-Bekő Máté [mailto:szalay.beko.m...@gmail.com] 
Sent: Monday, June 20, 2022 10:29 AM
To: UserZooKeeper 
Cc: DevZooKeeper ; Heller, George A III CTR (USA) 

Subject: Re: Can the leader of a Zookeeper be specifically selected at startup?

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

I also don't really know why you would need a single host being "preferred"
as leader. I think the safest (and the best practice) is to make sure all your 
ZooKeeper servers are the same in terms of networking / performance / etc.

Not knowing your goals, maybe the Observer feature is also something you can 
take a look into:
https://zookeeper.apache.org/doc/r3.6.3/zookeeperObservers.html

Best regards,
Mate

On Mon, Jun 20, 2022 at 9:57 AM Enrico Olivelli  wrote:

> George,
> really, it should not be a problem which is the leader. it is 
> automatically chosen.
> Each node should be ideally as powerful as the other peers.
>
> why do you need this "preferred leader" ?
> I am afraid that you have some flaw in your design
>
> Enrico
>
> Il giorno lun 20 giu 2022 alle ore 05:39 Kezhu Wang  
> ha scritto:
> >
> > Hi,
> >
> > I think this could be achieved with help from `reconfig`[1]:
> > * Configs all nodes with `standaloneEnabled=false`,
> `reconfigEnabled=true`.
> > * Starts node-2 as sole quorum participant.
> > * Now node-2 is the leader. You will see "No server failure will be 
> > tolerated. You need at least 3 servers”.
> > * Starts node-1 and node-3 with all quorum.
> > * `zkCli.sh config` shows only node-2 for now.
> > * `zkCli.sh reconfig -add node-1,node-2` will add both node-1, 
> > node-3 to quorum.
> > * According to `Leader.tryToCommit`[2], node-2 will be the leader 
> > due to old leadership in old quorum and voter in new quorum.
> >
> > node-2 is the leader in whole progress.
> >
> > [1]: https://zookeeper.apache.org/doc/current/zookeeperReconfig.html
> > [2]:
> >
> https://github.com/apache/zookeeper/blob/b4f9aab099880ba8ef08eaff697de
> be6cdeae057/zookeeper-server/src/main/java/org/apache/zookeeper/server
> /quorum/Leader.java#L950
> >
> > Best,
> > Kezhu Wang
> >
> > On June 19, 2022 at 23:00:59, Heller, George A III CTR (USA) (
> > george.a.heller2@mail.mil.invalid) wrote:
> >
> > We have 3 Zookeeper nodes and would like node 2 to always be the 
> > leader unless node 2 goes down. IF node 2 goes down, then either 
> > node 1 or node
> 3
> > would be the leader.
> >
> >
> >
> > Can this be done? If so, how would this be done?
>


RE: Migrate from 3.4.x to 3.5.5

2019-09-09 Thread Kathryn Hogg
Unfortunately, it’s a one person project and I believe the maintainer has 
stated that he's not using .Net any more so it doesn't get a lot of attention.  
I've sent some bug fixes -- most notably the patch I submitted to Zookeeper for 
the bug in the lock recipe -- but he's reticent to include patches that aren't 
in ZK core.   So I was maintaining an internal fork at work.

Also, my duties at work changed last winter and I'm no longer working on .Net 
things so I've had limited time to do a ZK 3.5.x upgrade myself or work on my 
Curator port to .Net.  

--
Kathryn Hogg
Senior Technology Architect


-Original Message-
From: Andor Molnar [mailto:an...@apache.org] 
Sent: Saturday, September 7, 2019 3:04 PM
To: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Hi Kathryn,

Are you a contributor of that .Net client? Is it officially supported by 
Microsoft?
Would it make sense to merge it into the main repository at some point?

Andor


-Original Message-
From: Kathryn Hogg 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org 
Subject: RE: Migrate from 3.4.x to 3.5.5
Date: Thu, 5 Sep 2019 13:13:15 +

Thanks!  That buys me some time from having to fork ZookeeperNetEx and do a 
3.5.x port myself.  Additionally, it should allow me to use Kafka with a 3.5.x 
zookeeper.
--Kathryn HoggSenior Technology Architect

-Original Message-From: Andor Molnar [mailto:an...@apache.org]
Sent: Thursday, September 5, 2019 12:42 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5 {External email message: This email is 
from an external source. Please exercise caution prior to opening attachments, 
clicking on links, or providing any sensitive information.} Hi Kathryn, That 
way should work without problems.
Andor

-Original Message-From: Kathryn Hogg Rep
ly-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org Subject: RE:
Migrate from 3.4.x to 3.5.5Date: Wed, 4 Sep 2019 15:07:11 + Question about 
the opposite:  We have some C# clients using ZookeeperNetEx which hasn't 
released a 3.5 version yet.  Will 3.4 clients work with 3.5 servers?--Kathryn 
HoggSenior Technology Architect -Original Message-From: Andor Molnar 
[mailto:an...@apache.org]S
ent: Wednesday, September 4, 2019 9:52 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5 {External email message: This email is 
from an external source. Please exercise caution prior to opening attachments, 
clicking on links, or providing any sensitive information.} Hi Zili, "If so, it 
seems upgrade client side force user to upgrade server side also.”Yes, if 
client is upgraded _and_ user wants to use a new feature in 3.5, then server 
side has to be upgraded too. ;) Andor

> On 2019. Sep 3., at 13:20, Zili Chen  wrote:If 
> so, it seems upgrade client side force user to upgrade server side 
> also.




RE: Migrate from 3.4.x to 3.5.5

2019-09-05 Thread Kathryn Hogg
Thanks!  That buys me some time from having to fork ZookeeperNetEx and do a 
3.5.x port myself.  Additionally, it should allow me to use Kafka with a 3.5.x 
zookeeper.

--
Kathryn Hogg
Senior Technology Architect


-Original Message-
From: Andor Molnar [mailto:an...@apache.org] 
Sent: Thursday, September 5, 2019 12:42 AM
To: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Hi Kathryn,

That way should work without problems.

Andor


-Original Message-
From: Kathryn Hogg 
Reply-To: user@zookeeper.apache.org
To: user@zookeeper.apache.org 
Subject: RE: Migrate from 3.4.x to 3.5.5
Date: Wed, 4 Sep 2019 15:07:11 +

Question about the opposite:  We have some C# clients using ZookeeperNetEx 
which hasn't released a 3.5 version yet.  Will 3.4 clients work with 3.5 
servers?
--Kathryn HoggSenior Technology Architect

-Original Message-From: Andor Molnar [mailto:an...@apache.org]
Sent: Wednesday, September 4, 2019 9:52 AMTo: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5 {External email message: This email is 
from an external source. Please exercise caution prior to opening attachments, 
clicking on links, or providing any sensitive information.} Hi Zili, "If so, it 
seems upgrade client side force user to upgrade server side also.”
Yes, if client is upgraded _and_ user wants to use a new feature in 3.5, then 
server side has to be upgraded too. ;) Andor


> On 2019. Sep 3., at 13:20, Zili Chen  wrote:
> If so, it seems upgrade client side force user to upgrade server side 
> also.




RE: Migrate from 3.4.x to 3.5.5

2019-09-04 Thread Kathryn Hogg
Question about the opposite:  We have some C# clients using ZookeeperNetEx 
which hasn't released a 3.5 version yet.  Will 3.4 clients work with 3.5 
servers?

--
Kathryn Hogg
Senior Technology Architect


-Original Message-
From: Andor Molnar [mailto:an...@apache.org] 
Sent: Wednesday, September 4, 2019 9:52 AM
To: user@zookeeper.apache.org
Subject: Re: Migrate from 3.4.x to 3.5.5

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Hi Zili,

"If so, it seems upgrade client side force user to upgrade server side also.”

Yes, if client is upgraded _and_ user wants to use a new feature in 3.5, then 
server side has to be upgraded too. ;)

Andor



> On 2019. Sep 3., at 13:20, Zili Chen  wrote:
> 
> If so, it seems upgrade client side force user to upgrade server side 
> also.



RE: About ZooKeeper Dynamic Reconfiguration

2019-08-21 Thread Kathryn Hogg
At my organization we solve that by running a 3rd site as mentioned in another 
email.  We run a 5 node ensemble with 2 nodes in each primary data center and 1 
node in the co-location facility.  We try to minimize usage of the 5th node so 
we explicitly exclude it from our clients' connection string.

This way, if there is a network partition between datacenters, which ever one 
can still talk to the node at the 3rd datacenter will maintain quorum.

Ideally, if it was possible, we'd somehow like the node at the third datacenter 
to never be elected as the leader and even better if there was some way for it 
to be a voting member only and not bear any data (similar to mongodb's arbiter).


-Original Message-
From: Cee Tee [mailto:c.turks...@gmail.com] 
Sent: Wednesday, August 21, 2019 1:27 PM
To: Alexander Shraer 
Cc: user@zookeeper.apache.org
Subject: Re: About ZooKeeper Dynamic Reconfiguration

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}


Yes, one side loses quorum and the other remains active. However we actively 
control which side that is, because our main application is active/passive with 
2 datacenters. We need Zookeeper to remain active in the applications active 
datacenter.

On 21 August 2019 17:22:00 Alexander Shraer  wrote:
> That's great! Thanks for sharing.
>
>
>> Added benefit is that we can also control which data center gets the 
>> quorum in case of a network outage between the two.
>
>
> Can you explain how this works? In case of a network outage between 
> two DCs, one of them has a quorum of participants and the other doesn't.
> The participants in the smaller set should not be operational at this 
> time, since they can't get quorum. no ?
>
>
>
> Thanks,
> Alex
>
>
> On Wed, Aug 21, 2019 at 7:55 AM Cee Tee  wrote:
>
> We have solved this by implementing a 'zookeeper cluster balancer', it 
> calls the admin server api of each zookeeper to get the current status 
> and will issue dynamic reconfigure commands to change dead servers 
> into observers so the quorum is not in danger. Once the dead servers 
> reconnect, they take the observer role and are then reconfigured into 
> participants again.
>
> Added benefit is that we can also control which data center gets the 
> quorum in case of a network outage between the two.
> Regards
> Chris
>
> On 21 August 2019 16:42:37 Alexander Shraer  wrote:
>
>> Hi,
>>
>> Reconfiguration, as implemented, is not automatic. In your case, when 
>> failures happen, this doesn't change the ensemble membership.
>> When 2 of 5 fail, this is still a minority, so everything should work 
>> normally, you just won't be able to handle an additional failure. If 
>> you'd like to remove them from the ensemble, you need to issue an 
>> explicit reconfiguration command to do so.
>>
>> Please see details in the manual:
>> https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.html
>>
>> Alex
>>
>> On Wed, Aug 21, 2019 at 7:29 AM Gao,Wei  wrote:
>>
>>> Hi
>>>I encounter a problem which blocks my development of load balance 
>>> using ZooKeeper 3.5.5.
>>>Actually, I have a ZooKeeper cluster which comprises of five zk 
>>> servers. And the dynamic configuration file is as follows:
>>>
>>>   server.1=zk1:2888:3888:participant;0.0.0.0:2181
>>>   server.2=zk2:2888:3888:participant;0.0.0.0:2181
>>>   server.3=zk3:2888:3888:participant;0.0.0.0:2181
>>>   server.4=zk4:2888:3888:participant;0.0.0.0:2181
>>>   server.5=zk5:2888:3888:participant;0.0.0.0:2181
>>>
>>>   The zk cluster can work fine if every member works normally. 
>>> However, if say two of them are suddenly down without previously 
>>> being notified, the dynamic configuration file shown above will not 
>>> be synchronized dynamically, which leads to the zk cluster fail to work 
>>> normally.
>>>   I think this is a very common case which may happen at any time. 
>>> If so, how can we resolve it?
>>>   Really look forward to hearing from you!
>>> Thanks
>>>



RE: Please Register: ZooKeeper Meetup @ Facebook, Nov 8th 2018

2018-11-06 Thread Kathryn Hogg
It appears the stream will be at https://www.facebook.com/zkmeetup

--
Kathryn Hogg
Senior Manager Product Development
Phone: 763.201.2000
Fax: 763.201.5333
Open Access Technology International, Inc.
3660 Technology Drive NE, Minneapolis, MN 55418

-Original Message-
From: Ivan Serdyuk [mailto:local.tourist.k...@gmail.com] 
Sent: Tuesday, November 6, 2018 4:58 PM
To: user@zookeeper.apache.org
Subject: Re: Please Register: ZooKeeper Meetup @ Facebook, Nov 8th 2018

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Sorry, just mentioned that it would be streamed via FB streaming.

Hope you are expecting to record one?

Ivan

On Wed, Nov 7, 2018 at 12:56 AM Ivan Serdyuk 
wrote:

> And where is the streaming link?
>
> On Tue, Nov 6, 2018 at 11:18 PM Norbert Kalmar 
>  wrote:
>
>> Yes, 2 days from now.
>>
>> Regards,
>> Norbert
>>
>> On Tue, Nov 6, 2018 at 9:07 PM Jeff Widman  wrote:
>>
>> > This is happening this week, correct?
>> >
>> > On Fri, Sep 14, 2018 at 8:54 AM Ivan Serdyuk <
>> local.tourist.k...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Awesome.
>> > >
>> > > I wonder if you are expecting to record your talk.
>> > >
>> > > Ivan
>> > >
>> > > On Fri, Sep 14, 2018 at 2:46 AM Mohamed Jeelani 
>> wrote:
>> > >
>> > > > Your ZooKeeper friends @ Facebook would like to invite you to 
>> > > > share
>> and
>> > > > learn what’s new with ZooKeeper.
>> > > >
>> > > > We will not only share what we at Facebook have been up to, but 
>> > > > we
>> have
>> > > > exciting talks from speakers from the ZooKeeper community lined 
>> > > > up
>> who
>> > > are
>> > > > eager to share what they've been working on as well. And of 
>> > > > course,
>> > we've
>> > > > got some cool swag for you :-)
>> > > >
>> > > > When: November 8th 2018, 5pm – 8pm (Talks: 5pm - 7pm; 
>> > > > Networking &
>> > Happy
>> > > > Hour: 7pm - 8pm)
>> > > > Where: Facebook HQ - MPK 16, 1 Hacker Way, Menlo Park, CA We 
>> > > > will have remote viewing locations in our Facebook Seattle
>> office,
>> > and
>> > > > the event will also be live streamed. You can indicate how 
>> > > > you'd
>> like
>> > to
>> > > > attend on the registration page.
>> > > >
>> > > > Please register here - https://zookeeperatfb.splashthat.com/
>> > > >
>> > > > We look forward to seeing you soon!
>> > > >
>> > > > ZooKeeper Friends @ Facebook
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > *Jeff Widman*
>> > jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J 
>> > (943-6265) <><
>> >
>>
>


RE: Non-incremental reconfig failing while trying to bind to same local client port

2018-05-10 Thread Kathryn Hogg
You can run lsof -i :2181 to see what process is using port 2181

-Original Message-
From: harish lohar [mailto:hklo...@gmail.com] 
Sent: Wednesday, May 09, 2018 8:01 PM
To: user@zookeeper.apache.org
Subject: Non-incremental reconfig failing while trying to bind to same local 
client port

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Hi All,

Need help resolving below issue:

2018-05-10 00:59:16,584 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@922] - Interrupting SendWorker
2018-05-10 00:59:16,584 [myid:1] - INFO
[QuorumPeerListener:QuorumCnxManager$Listener@636] - My election bind port:
/10.60.11.240:3888
2018-05-10 00:59:16,584 [myid:1] - INFO
[QuorumPeer[myid=1](plain=/127.0.0.1:2181
)(secure=disabled):NIOServerCnxnFactory@706] - binding to port localhost/
127.0.0.1:2181
2018-05-10 00:59:16,585 [myid:1] - ERROR
[QuorumPeer[myid=1](plain=/127.0.0.1:2181
)(secure=disabled):NIOServerCnxnFactory@722] - Error reconfiguring client port 
to localhost/127.0.0.1:2181 Address already in use


RE: Bug in WriteLock recipe

2018-02-13 Thread Kathryn Hogg
Thanks Flavio,

I'm debugging this and found another issue in the code in findPrefixInChildren()

private void findPrefixInChildren(String prefix, ZooKeeper zookeeper, 
String dir) 
throws KeeperException, InterruptedException {
List names = zookeeper.getChildren(dir, false);
for (String name : names) {
if (name.startsWith(prefix)) {
id = name;  /*   THIS DOES NOT HAVE THE FULL PATH */
if (LOG.isDebugEnabled()) {
LOG.debug("Found id created last time: " + id);
}
break;
}
}
if (id == null) {
id = zookeeper.create(dir + "/" + prefix, data, 
getAcl(), EPHEMERAL_SEQUENTIAL);   /* THIS HAS THE FULL 
PATH */

if (LOG.isDebugEnabled()) {
LOG.debug("Created id: " + id);
}
}

}

If we find the node in the children, we set id to x-$session-$sequence.  If we 
create the znode, id is set to $dir/x-$session-$sequence

I believe the first case should be 
  Id = dir + "/" + name;

-Original Message-
From: Flavio Junqueira [mailto:f...@apache.org] 
Sent: Tuesday, February 13, 2018 3:58 PM
To: user@zookeeper.apache.org
Subject: Re: Bug in WriteLock recipe

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

You are right that there is a race between getting the children and checking 
whether the predecessor is there. If we fail to set a watcher, then we won't 
try locking again. I think setting id to null will work because it forces 
another iteration of the do/while loop, which checks whether there is still 
some predecessor, setting a watcher accordingly. The number of iterations 
should be finite because we eventually hit the owned znode.

It might better to add a different condition for clarity, like `while (Id == 
null || watcherNotSet)`. In any case, I'd appreciate if you could chime in and 
contribute your changes to https://issues.apache.org/jira/browse/ZOOKEEPER-645 
<https://issues.apache.org/jira/browse/ZOOKEEPER-645>.

-Flavio

> On 13 Feb 2018, at 22:01, Kathryn Hogg <kathryn.h...@oati.net> wrote:
> 
> Hey Flavio,
> 
> FYI, I'm on 3.4.11 if that makes a difference.
> 
> The problem is
>  1. In order to get to the exists() call, we've already determined that 
> id is not null.
>  2. After stat returns null, we log a message but do not reset stat to 
> null.
>  3.  The while loop will terminate because id is not null
> 
> I think this can fixed by setting id to null if stat returns null. This is 
> similar to what is done when lessThanMe is not Empty.
> 
> Here's an outline of the code I'm talking about (my suggested change is after 
> the LOG.warn()):
> 
> do {
>   if (id == null) {
>   
>  }
>  If (id != null) {
> if (names.IsEmpty()) {
> ...
> Id = null;
> } else {
> 
>If (!lessThanMe.isEmpty())
>Stat stat = zookeeper.exists(lastChildId, new 
> LockWatcher());
>if (stat != null) {
>return Boolean.FALSE;
>} else {
>LOG.warn("Could not find the" +
>   " stats for less than me: " + 
> lastChildName.getName());
> /* id = null; */  // id is not null here so 
> the while loop will terminate 
>}
>   } else {
> if (isOwner()) {
>  if (callback != null) {
>   callback.lockAcquired();
> }
>  return Boolean.TRUE;
>}
>}
>}
> }
> }
> while (id == null);
> 
> 
> -Original Message-
> From: Flavio Junqueira [mailto:f...@apache.org] 
> Sent: Tuesday, February 13, 2018 2:45 PM
> To: user@zookeeper.apache.org
> Subject: Re: Bug in WriteLock recipe
> 
> {External email message: This email is from an external source. Please 
> exercise caution prior to opening attachments, clicking on links, or 
> providing any sensitive information.}
> 
> Hi Kathryn,
> 
> Every time that execute method is invoked, it will get children. From your 
> description, in the case the predecessor node is deleted and stat is null, 
> the next call will not contain that predecessor znode. Consequently, it won't 
> happen indefinitely. Makes sense?
> 
> There is actually an old iss

RE: Bug in WriteLock recipe

2018-02-13 Thread Kathryn Hogg
Hey Flavio,

FYI, I'm on 3.4.11 if that makes a difference.

The problem is
  1. In order to get to the exists() call, we've already determined that id 
is not null.
  2. After stat returns null, we log a message but do not reset stat to 
null.
  3.  The while loop will terminate because id is not null

I think this can fixed by setting id to null if stat returns null. This is 
similar to what is done when lessThanMe is not Empty.

Here's an outline of the code I'm talking about (my suggested change is after 
the LOG.warn()):

do {
   if (id == null) {
   
  }
  If (id != null) {
 if (names.IsEmpty()) {
 ...
 Id = null;
 } else {
 
If (!lessThanMe.isEmpty())
Stat stat = zookeeper.exists(lastChildId, new 
LockWatcher());
if (stat != null) {
return Boolean.FALSE;
} else {
LOG.warn("Could not find the" +
" stats for less than me: " + 
lastChildName.getName());
 /* id = null; */  // id is not null here so 
the while loop will terminate 
}
   } else {
 if (isOwner()) {
  if (callback != null) {
   callback.lockAcquired();
 }
  return Boolean.TRUE;
}
}
}
 }
}
while (id == null);


-Original Message-
From: Flavio Junqueira [mailto:f...@apache.org] 
Sent: Tuesday, February 13, 2018 2:45 PM
To: user@zookeeper.apache.org
Subject: Re: Bug in WriteLock recipe

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

Hi Kathryn,

Every time that execute method is invoked, it will get children. From your 
description, in the case the predecessor node is deleted and stat is null, the 
next call will not contain that predecessor znode. Consequently, it won't 
happen indefinitely. Makes sense?

There is actually an old issue about WriteLock that you may want to have a look:

https://issues.apache.org/jira/browse/ZOOKEEPER-645 
<https://issues.apache.org/jira/browse/ZOOKEEPER-645>

Thanks,
-Flavio


> On 13 Feb 2018, at 18:49, Kathryn Hogg <kathryn.h...@oati.net> wrote:
> 
> I'm actually using the WriteLock from the ZookeeperNetEx C# code but I've 
> verified that the same issue exists in the Java recipe.  On a busy system, 
> I'm fairly frequently seeing WriteLock that is never granted to client and 
> gets stuck.
> 
> What I believe is happening is the lock sets a watch on the request before 
> him via this code:
> 
>Stat stat = zookeeper.exists(lastChildId, new 
> LockWatcher());
>if (stat != null) {
>return Boolean.FALSE;
>} else {
>LOG.warn("Could not find the" +
>" stats for 
> less than me: " + lastChildName.getName());
>}
> 
> The problem (as I see it and I'm still fairly new to Zookeeper) is that if 
> the node represented by lastChildId has been deleted before the call to 
> exists is made, stat will return null and the watch will only ever be invoked 
> when the znode is created.  And of course that will never happen.
> 
> The message is appearing in my log and my watcher for the lock is never 
> invoked.
> 
> [2018-02-13 16:49:17.905 GMTWARNING WriteLock   Could not 
> find the stats for less than me: 
> /token/SegmentProfileQueueToken/x-72057953399865370-000724]
> 
> I'm not entirely sure of the proper way of fixing this but I think setting
>Id = null;
> When stat is null should work.
> 
> Can someone verify if my analysis is correct?
> 
> --
> Kathryn Hogg
> Senior Manager Product Development
> Phone: 763.201.2000
> Fax: 763.201.5333
> Open Access Technology International, Inc.
> 3660 Technology Drive NE, Minneapolis, MN 55418
> 
> CONFIDENTIAL INFORMATION: This email and any attachment(s) contain 
> confidential and/or proprietary information of Open Access Technology 
> International, Inc.  Do not copy or distribute without the prior written 
> consent of OATI.  If you are not named a recipient to the message, please 
> notify the sender immediately and do not retain the message in any form, 
> printed or electronic.
> 



Bug in WriteLock recipe

2018-02-13 Thread Kathryn Hogg
I'm actually using the WriteLock from the ZookeeperNetEx C# code but I've 
verified that the same issue exists in the Java recipe.  On a busy system, I'm 
fairly frequently seeing WriteLock that is never granted to client and gets 
stuck.

What I believe is happening is the lock sets a watch on the request before him 
via this code:

Stat stat = zookeeper.exists(lastChildId, new 
LockWatcher());
if (stat != null) {
return Boolean.FALSE;
} else {
LOG.warn("Could not find the" +
" stats for 
less than me: " + lastChildName.getName());
}

The problem (as I see it and I'm still fairly new to Zookeeper) is that if the 
node represented by lastChildId has been deleted before the call to exists is 
made, stat will return null and the watch will only ever be invoked when the 
znode is created.  And of course that will never happen.

The message is appearing in my log and my watcher for the lock is never invoked.

[2018-02-13 16:49:17.905 GMTWARNING WriteLock   Could not find 
the stats for less than me: 
/token/SegmentProfileQueueToken/x-72057953399865370-000724]

I'm not entirely sure of the proper way of fixing this but I think setting
Id = null;
When stat is null should work.

Can someone verify if my analysis is correct?

--
Kathryn Hogg
Senior Manager Product Development
Phone: 763.201.2000
Fax: 763.201.5333
Open Access Technology International, Inc.
3660 Technology Drive NE, Minneapolis, MN 55418

CONFIDENTIAL INFORMATION: This email and any attachment(s) contain confidential 
and/or proprietary information of Open Access Technology International, Inc.  
Do not copy or distribute without the prior written consent of OATI.  If you 
are not named a recipient to the message, please notify the sender immediately 
and do not retain the message in any form, printed or electronic.



transaction log directory on VM

2018-01-25 Thread Kathryn Hogg
If we are running zookeeper on virtual machines, what is the recommended best 
practice in regards to the transaction log directory?  We can create a 
dedicated virtual disk for the transaction logs but what if that vdi/vmdk file 
is sitting on the same drive on the host OS?

--
Kathryn Hogg
Open Access Technology International, Inc.
3660 Technology Drive NE, Minneapolis, MN 55418

CONFIDENTIAL INFORMATION: This email and any attachment(s) contain confidential 
and/or proprietary information of Open Access Technology International, Inc.  
Do not copy or distribute without the prior written consent of OATI.  If you 
are not named a recipient to the message, please notify the sender immediately 
and do not retain the message in any form, printed or electronic.



RE: Zookeeper session expiration

2017-12-07 Thread Kathryn Hogg
I'm pretty new to zookeeper but have a fair amount of experience with virtual 
synchrony going back many years.  Even though time is relative, it is possible 
that if the clock suddenly jumps forward on the server to prematurely declare 
timeouts as expired.  I'm not sure how Zookeeper handles that but in Isis, if 2 
consecutive calls to gettimeofday had too large of a difference, it considered 
it fishy.  

Of course, this is why we use ntp with adjtime to avoid clocks going backwards 
or making large jumps forward.

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Wednesday, December 06, 2017 5:18 PM
To: UserZooKeeper 
Subject: Re: Zookeeper session expiration

{External email message: This email is from an external source. Please exercise 
caution prior to opening attachments, clicking on links, or providing any 
sensitive information.}

What Jordan said + time use is only in the relative sense, not the absolute. 
Session tracking (expiration) is relative to the start of leadership.

Patrick

On Mon, Dec 4, 2017 at 12:21 PM, Jordan Zimmerman < jor...@jordanzimmerman.com> 
wrote:

> ZooKeeper, indeed, does not use wall clock time. It uses 
> System.nanoTime() for most operations. Further, all operations go 
> through the Leader node so only the Leader's notion of time matters. 
> The Leader manages the session via a "SessionTracker" instance. The code is 
> in SessionTrackerImpl.java.
> There is a sessionExpiryQueue which is a kind of priority queue that 
> returns expired sessions based on System.nanoTime().
>
> -JZ
>
> > On Dec 4, 2017, at 12:09 PM, Abraham Fine  wrote:
> >
> > Hello Anthony and Shawn-
> >
> > To the best of my knowledge ZooKeeper does not use the "wall clock" 
> > time anywhere. So that should not be the problem.
> >
> > Please consider enabling debug logging, which should allow you to 
> > track the "pings".
> >
> > Thanks,
> > Abe
> >
> > On Mon, Dec 4, 2017, at 11:51, Anthony Shaya wrote:
> >> Thanks Shawn, should I message the developer mailing list for a 
> >> more definitive answer?
> >>
> >> Thanks again for the reply.
> >>
> >> -Original Message-
> >> From: Shawn Heisey [mailto:apa...@elyograg.org]
> >> Sent: Monday, December 4, 2017 2:49 PM
> >> To: user@zookeeper.apache.org
> >> Subject: Re: Zookeeper session expiration
> >>
> >> On 12/4/2017 8:22 AM, Anthony Shaya wrote:
> >>> My question is related to how session expiration works, I noticed 
> >>> on
> many of the client machines the times across these machines were all 
> off (by anywhere from 1 minute to 20 minutes - which was resolved 
> after discovery - haven't verified this completely yet). Can this 
> directly affect session expiration within the zookeeper cluster?
> >>>
> >>>   *   I read the following in https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org%
> 2Fhadoop%2FZooKeeper%2FFAQ=02%7C01%7C%7C6d6643860a4e4a8194c808d53
> b50 23ec%7Cc61157e903cb47589165ee7845cb0ca3%7C0%7C0%
> 7C636480137750841475=RwGGH19FLeYFmXMrg5GBkSLJ65ANj1
> EXkTvwyk6OLd4%3D=0 , "Expirations happens when the cluster 
> does not hear from the client within the specified session timeout period 
> (i.e.
> no heartbeat).". So in some case it seems like if the times were wrong 
> across the machines its possible one of the clients could of 
> effectively sent a heart beat in the past (not sure about this tbh) 
> and then the cluster expires the session?
> >>
> >> I make these comments without any knowledge of what ZK code 
> >> actually does.  I am a member of this list because I'm a 
> >> representative of the Apache Solr project, which uses the ZK client 
> >> in order to maintain a cluster.
> >>
> >> IMHO, any software which makes actual decisions based on the 
> >> timestamps in messages from another system is badly designed.  I 
> >> would hope that
> the
> >> ZK designers know this, and always make any decisions related to 
> >> time using the clock in the local system only.
> >>
> >> If ZK's designers did the right thing, then a session timeout would 
> >> indicate that quite literally no heartbeats were received in X 
> >> seconds, as measured by the local clock, and the local clock ONLY 
> >> ... NOT from timestamp information received from another system.
> >>
> >> Although such a lack of communication could be caused by any number 
> >> of things, including network hardware failure, one of the most 
> >> common reasons I have seen for problems like this is extreme java 
> >> garbage collection pauses in the client software.
> >>
> >> Situations where the heap is a little bit too small can cause a 
> >> java program to basically be doing garbage collection constantly, 
> >> so it doesn't have much time to do anything else, like send 
> >> heartbeats to ZK servers.
> >>
> >> Situations where the heap is HUGE and garbage collection is not 
> >> well tuned can lead to pauses of a minute or longer while Java does 
> >> a