Re: locking/leader election and dealing with session loss

2015-07-16 Thread Ivan Kelly
I've seen 40s+.

Also, if combined with a network partition, the gc pause only needs 1/3 of
session timeout for the same effect to occur.

On Thu, 16 Jul 2015 15:58 Camille Fournier cami...@apache.org wrote:

 They can and have happened in prod to people. I started taking about it
 after hearing enough people complain about just this situation on twitter.
 If you are relying on very large jvm memory footprints a 30s gc pause can
 and should be expected. In general I think most people don't need to worry
 about this most of the time but it's one of those things that happens and
 the developers are almost always shocked. I'm a fan of being clear about
 edge cases, even rare ones, so that devs can make the right tradeoffs for
 their env.
 Of course there are a myriad theoretical possibilities. But I don’t
 believe any of what you’ve mentioned will happen in production. For any
 reasonable case, you can be guaranteed that no two processes will consider
 themselves lock holders at the same instant in time.

 -Jordan


 On July 16, 2015 at 7:58:06 AM, Ivan Kelly (iv...@apache.org) wrote:

 On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman 
 jor...@jordanzimmerman.com
 wrote:

  Are you really seeing 30s gc pauses in production? If so, then of course
  this could happen. However, if your application can tolerate a 30s pause
  (which is hard to believe) then your session timeout is too low. The
 point
  of the session timeout is to have enough coverage. So, if your app has 30
  seconds allowable pauses your session timeout would have to be much
 longer.
 
 GC is just an example. There's other ways the same scenario could happen.
 The machine could swap out the process due to load. Someone could do
 something stupid in the zookeeper event thread and the session expired
 event is delayed. The state update could have hit the ip stack during
 network partition, and the process then got wedged. The state update packet
 could have hit the network and been routed via the moon. The clock could
 break.

 If you are relying on a timer on the zk client to maintain a guarantee,
 then you really aren't giving any guarantee because the zk client doesn't
 have control over all the things that could go wrong.

 -Ivan



Re: locking/leader election and dealing with session loss

2015-07-16 Thread Ivan Kelly
In case there's still doubt around this issue. I've written a demo app that
demonstrates the problem.

https://github.com/ivankelly/hanging-chad

-Ivan

On Wed, Jul 15, 2015 at 11:22 PM Alexander Shraer shra...@gmail.com wrote:

 I disagree, ZooKeeper itself actually doesn't rely on timing for safety -
 it won't get into an inconsistent state even if all timing assumptions fail
 (except for the sync operation, which is then not guaranteed to return the
 latest value, but that's a known issue that needs to be fixed).




 On Wed, Jul 15, 2015 at 2:13 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  This property may hold if you make a lot of timing/synchrony assumptions
 
  These assumptions and timing are intrinsic to using ZooKeeper. So, of
  course I’m making these assumptions.
 
  -Jordan
 
 
 
  On July 15, 2015 at 3:57:12 PM, Alexander Shraer (shra...@gmail.com)
  wrote:
 
  This property may hold if you make a lot of timing/synchrony assumptions
  -- agreeing on who holds the lock in an asynchronous distributed system
  with failures is impossible, this is the FLP impossibility.
 
  But even if it holds, this property is not very useful if the ZK client
  itself doesn't have the application data. So one has to consider whether
 it
  is possible that the application sees a messages from two clients that
 both
  think are the leader in an order which contradicts the lock acquisition
  order.
 
  On Wed, Jul 15, 2015 at 1:26 PM, Jordan Zimmerman 
  jor...@jordanzimmerman.com wrote:
 
   I think we may be talking past each other here. My contention (and the
  ZK docs agree BTW) is that, properly written and configured, at any
  snapshot in time no two clients think they hold the same lock”. How your
  application acts on that fact is another thing. You might need sequence
  numbers, you might not.
 
  -Jordan
 
 
  On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)
  wrote:
 
   Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
  link
  
 
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf
  
 
  it suggests 2 ways in which the storage can support lock generations and
  proposes an alternative for the case where the storage can't be made
 aware
  of lock generations.
 
  On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
  jor...@jordanzimmerman.com wrote:
 
   Ivan, I just read the blog and I still don’t see how this can happen.
   Sorry if I’m being dense. I’d appreciate a discussion on this. In your
  blog
   you state: when ZooKeeper tells you that you are leader, there’s no
   guarantee that there isn’t another node that 'thinks' its the leader.”
   However, given a long enough session time — I usually recommend 30–60
   seconds, I don’t see how this can happen. The client itself determines
  that
   there is a network partition when there is no heartbeat success. The
   heartbeat is a fraction of the session timeout. Once the heartbeat
  fails,
   the client must assume it no longer has the lock. Another client
 cannot
   take over the lock until, at minimum, session timeout. So, how then
 can
   there be two leaders?
  
   -Jordan
  
   On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:
  
   I blogged about this exact problem a couple of weeks ago [1]. I give
 an
   example of how split brain can happen in a resource under a zk lock
  (Hbase
   in this case). As Camille says, sequence numbers ftw. I'll add that
 the
   data store has to support them though, which not all do (in fact I've
  yet
   to see one in the wild that does). I've implemented a prototype that
  works
   with hbase[2] if you want to see what it looks like.
  
   -Ivan
  
   [1]
  
  
 
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
   [2] https://github.com/ivankelly/hbase-exclusive-writer
  
   On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com
  wrote:
  
Jordan, I mean the client gives up the lock and stops working on the
   shared
resource. So when zookeeper is unavailable, no one is working on any
   shared
resource (because they cannot distinguish network partition from
   zookeeper
DEAD scenario).
   
   
   
--
View this message in context:
   
  
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
Sent from the zookeeper-user mailing list archive at Nabble.com.
   
  
 
 
 



Re: locking/leader election and dealing with session loss

2015-07-16 Thread Ivan Kelly
On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

 Are you really seeing 30s gc pauses in production? If so, then of course
 this could happen. However, if your application can tolerate a 30s pause
 (which is hard to believe) then your session timeout is too low. The point
 of the session timeout is to have enough coverage. So, if your app has 30
 seconds allowable pauses your session timeout would have to be much longer.

GC is just an example. There's other ways the same scenario could happen.
The machine could swap out the process due to load. Someone could do
something stupid in the zookeeper event thread and the session expired
event is delayed. The state update could have hit the ip stack during
network partition, and the process then got wedged. The state update packet
could have hit the network and been routed via the moon. The clock could
break.

If you are relying on a timer on the zk client to maintain a guarantee,
then you really aren't giving any guarantee because the zk client doesn't
have control over all the things that could go wrong.

-Ivan


Re: locking/leader election and dealing with session loss

2015-07-16 Thread Jordan Zimmerman
Of course there are a myriad theoretical possibilities. But I don’t believe any 
of what you’ve mentioned will happen in production. For any reasonable case, 
you can be guaranteed that no two processes will consider themselves lock 
holders at the same instant in time.

-Jordan


On July 16, 2015 at 7:58:06 AM, Ivan Kelly (iv...@apache.org) wrote:

On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman jor...@jordanzimmerman.com  
wrote:  

 Are you really seeing 30s gc pauses in production? If so, then of course  
 this could happen. However, if your application can tolerate a 30s pause  
 (which is hard to believe) then your session timeout is too low. The point  
 of the session timeout is to have enough coverage. So, if your app has 30  
 seconds allowable pauses your session timeout would have to be much longer.  
  
GC is just an example. There's other ways the same scenario could happen.  
The machine could swap out the process due to load. Someone could do  
something stupid in the zookeeper event thread and the session expired  
event is delayed. The state update could have hit the ip stack during  
network partition, and the process then got wedged. The state update packet  
could have hit the network and been routed via the moon. The clock could  
break.  

If you are relying on a timer on the zk client to maintain a guarantee,  
then you really aren't giving any guarantee because the zk client doesn't  
have control over all the things that could go wrong.  

-Ivan  


Re: locking/leader election and dealing with session loss

2015-07-16 Thread Camille Fournier
They can and have happened in prod to people. I started taking about it
after hearing enough people complain about just this situation on twitter.
If you are relying on very large jvm memory footprints a 30s gc pause can
and should be expected. In general I think most people don't need to worry
about this most of the time but it's one of those things that happens and
the developers are almost always shocked. I'm a fan of being clear about
edge cases, even rare ones, so that devs can make the right tradeoffs for
their env.
Of course there are a myriad theoretical possibilities. But I don’t believe
any of what you’ve mentioned will happen in production. For any reasonable
case, you can be guaranteed that no two processes will consider themselves
lock holders at the same instant in time.

-Jordan


On July 16, 2015 at 7:58:06 AM, Ivan Kelly (iv...@apache.org) wrote:

On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman jor...@jordanzimmerman.com

wrote:

 Are you really seeing 30s gc pauses in production? If so, then of course
 this could happen. However, if your application can tolerate a 30s pause
 (which is hard to believe) then your session timeout is too low. The point
 of the session timeout is to have enough coverage. So, if your app has 30
 seconds allowable pauses your session timeout would have to be much
longer.

GC is just an example. There's other ways the same scenario could happen.
The machine could swap out the process due to load. Someone could do
something stupid in the zookeeper event thread and the session expired
event is delayed. The state update could have hit the ip stack during
network partition, and the process then got wedged. The state update packet
could have hit the network and been routed via the moon. The clock could
break.

If you are relying on a timer on the zk client to maintain a guarantee,
then you really aren't giving any guarantee because the zk client doesn't
have control over all the things that could go wrong.

-Ivan


Re: locking/leader election and dealing with session loss

2015-07-16 Thread Jordan Zimmerman
And a new Curator Tech Note to match:

https://cwiki.apache.org/confluence/display/CURATOR/TN10

-JZ



On July 16, 2015 at 12:54:29 PM, Ivan Kelly (iv...@apache.org) wrote:

I've seen 40s+.  

Also, if combined with a network partition, the gc pause only needs 1/3 of  
session timeout for the same effect to occur.  

On Thu, 16 Jul 2015 15:58 Camille Fournier cami...@apache.org wrote:  

 They can and have happened in prod to people. I started taking about it  
 after hearing enough people complain about just this situation on twitter.  
 If you are relying on very large jvm memory footprints a 30s gc pause can  
 and should be expected. In general I think most people don't need to worry  
 about this most of the time but it's one of those things that happens and  
 the developers are almost always shocked. I'm a fan of being clear about  
 edge cases, even rare ones, so that devs can make the right tradeoffs for  
 their env.  
 Of course there are a myriad theoretical possibilities. But I don’t  
 believe any of what you’ve mentioned will happen in production. For any  
 reasonable case, you can be guaranteed that no two processes will consider  
 themselves lock holders at the same instant in time.  
  
 -Jordan  
  
  
 On July 16, 2015 at 7:58:06 AM, Ivan Kelly (iv...@apache.org) wrote:  
  
 On Thu, Jul 16, 2015 at 1:38 PM Jordan Zimmerman   
 jor...@jordanzimmerman.com  
 wrote:  
  
  Are you really seeing 30s gc pauses in production? If so, then of course  
  this could happen. However, if your application can tolerate a 30s pause  
  (which is hard to believe) then your session timeout is too low. The  
 point  
  of the session timeout is to have enough coverage. So, if your app has 30  
  seconds allowable pauses your session timeout would have to be much  
 longer.  
   
 GC is just an example. There's other ways the same scenario could happen.  
 The machine could swap out the process due to load. Someone could do  
 something stupid in the zookeeper event thread and the session expired  
 event is delayed. The state update could have hit the ip stack during  
 network partition, and the process then got wedged. The state update packet  
 could have hit the network and been routed via the moon. The clock could  
 break.  
  
 If you are relying on a timer on the zk client to maintain a guarantee,  
 then you really aren't giving any guarantee because the zk client doesn't  
 have control over all the things that could go wrong.  
  
 -Ivan  
  


Re: locking/leader election and dealing with session loss

2015-07-16 Thread Ivan Kelly
On Thu, Jul 16, 2015 at 7:57 PM Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

 I think this conversation is converging.

 Summary: there is always an edge case where VM pauses might exceed your
 client heartbeat and cause a client misperception about it’s state for a
 short period of time once the VM unpauses. In practice, a tuned VM that has
 been running within known bounds for a reasonable period will not exhibit
 this behavior. Session timeout must match this known bounds in order to
 have consistent client state.

This is assuming that the conditions won't change, because if they do, the
known bounds go out the window.

Ultimately, if you rely on timeouts you have to evaluate how much risk you
are willing to take with your data and how much work you are willing to do
to mitigate this risk. This risk goes away if the leader cannot update any
state when a new leader appears. For this you need some sort of fencing
mechanism, such as sequence numbers. Personally, I would always go for the
latter option.

-Ivan


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Thanks for the quick response Camille. If client A owns the lock, gets
disconnected due to network partition, it will not see the SESSION_EXPIRED
event until it is too late, i.e. client B has acquired the lock and done the
damage. Problem here is that client cannot distinguish network partition
from zookeeper ensemble in leader election state.



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581279.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Jordan, I agree with your recommendation to guarantee single lock
owner/leader. I am trying to figure out how I can improve the availability
of the leader in a specific corner case, i.e. when zookeeper ensemble is
under going leader election or broken (happens often due to software push,
host replacement, etc). When ensemble is broken, it means that entire
application is DEAD (no leader for any shard/resource).



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581281.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
You should not commit suicide unless it goes into SESSION_EXPIRED state.
The quorum won't delete the ephemeral node immediately, it will only delete
it when the session expires. So if your client for whatever reason
disconnects and reconnects before the session expires, it will be fine.

C

On Wed, Jul 15, 2015 at 1:54 PM, Vikas Mehta vikasme...@gmail.com wrote:

 When using zookeeper for leader election or distributed locking, my
 assumption is that as soon as the lock owner sees session transition to
 'CONNECTING' state, it should commit suicide to avoid the risk of multiple
 owners. Please correct my assumption if I am wrong or there is a better way
 to guarantee a single lock owner/leader.

 If above assumption is correct, I am trying to figure out how I can improve
 the availability of the application (leader/lock owner) when zookeeper
 ensemble is broken (eg. undergoing zookeeper leader election for prolonged
 period of time). Options I have considered:

 1/ use multiple ensembles for leader election/locking to avoid SPOF
 (complex
 to implement)
 2/ extend the zookeeper protocol to provide client more info on connection
 loss, like zookeeper leader election in progress so that client can decide
 when it is ok to not commit suicide and still guarantee a single
 application
 leader/lock owner. (haven't been able to prove that this will guarantee
 single application leader/lock owner).

 If this has been already answered or solved, please point me to the
 post/doc.



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
Once client A loses connection it must assume that it no longer has the lock 
(you could try to time the session but I think that’s a bad idea). Once you 
reconnect, you will know if your session is still active or not. When done 
correctly, there’s no chance that both A and B will think they own the lock at 
the same time.

-Jordan



On July 15, 2015 at 1:17:10 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

Thanks for the quick response Camille. If client A owns the lock, gets  
disconnected due to network partition, it will not see the SESSION_EXPIRED  
event until it is too late, i.e. client B has acquired the lock and done the  
damage. Problem here is that client cannot distinguish network partition  
from zookeeper ensemble in leader election state.  



--  
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581279.html
  
Sent from the zookeeper-user mailing list archive at Nabble.com.  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
To expand on my point, if you want to be able to continue to attempt to
make progress when the ZK is down, the act of getting a lock should also
cause the lock owner to get a sequence number that it can use to identify
the period of operation it is in. I believe that then, say, you get
sequence number 1. If you tag all of your requests with 1, if for any
reason you lose the lock and don't know it, and server #2 gets the lock, it
should get sequence #2. The resource should then reject all requests with
sequence below 2, so if any remaining requests tagged 1 are lying around
they should be rejected by the resource.
And there you have it: You can continue to make safe forward progress while
in uncertain state on the ZK side so long as the original lock holder is
available and the resource validates these things. If both the ZK itself go
down and the original lock holder goes down, you're still AWOL presumably.

C


On Wed, Jul 15, 2015 at 2:24 PM, Camille Fournier cami...@apache.org
wrote:

 I thought that the client itself had a notion of the session timeout
 internally that would conservatively let the client know that it was dead?
 If not, then that's my faulty memory.

 That being said, if you really care about the client not sending messages
 when it does not have the lock, the resource under contention needs to
 validate the messages it is receiving, though. You cannot guarantee that
 just because a client believes it is connected and sends a message to
 locked resource that the message will be received while the sender still
 has the lock. If you don't care about this possibility then just assuming
 you lose the lock when you are in any state other than connected is
 adequate but just be aware that events such as long GC pauses and network
 issues can cause you to access the resource improperly.

 C

 On Wed, Jul 15, 2015 at 2:19 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

 Once client A loses connection it must assume that it no longer has the
 lock (you could try to time the session but I think that’s a bad idea).
 Once you reconnect, you will know if your session is still active or not.
 When done correctly, there’s no chance that both A and B will think they
 own the lock at the same time.

 -Jordan



 On July 15, 2015 at 1:17:10 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

 Thanks for the quick response Camille. If client A owns the lock, gets
 disconnected due to network partition, it will not see the SESSION_EXPIRED
 event until it is too late, i.e. client B has acquired the lock and done
 the
 damage. Problem here is that client cannot distinguish network partition
 from zookeeper ensemble in leader election state.



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581279.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.





Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
He’s talking about multiple writers. Given a reasonable session timeout, even a 
GC shouldn’t matter. If the GC causes a heartbeat miss the client will get 
SysDisconnected.

-Jordan



On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami...@gmail.com) wrote:

If client a does a full gc immediately before sending a message that is  
long enough to lose the lock, it will send the message out of order. You  
cannot guarantee exclusive access without verification at the locked  
resource.  

C  
On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com  
wrote:  

 I don’t see how there’s a chance of multiple writers. Assuming a  
 reasonable session timeout:  
  
 * Client A gets the lock  
 * Client B watches Client A’s lock node  
 * Client A gets a network partition  
 * Client A will get a SysDisconnected before the session times out  
 * Client A must immediately assume it no longer has the lock  
 * Client A’s session times out  
 * Client A’s ephemeral node is deleted  
 * Client B’s watch fires  
 * Client B takes the lock  
 * Client A reconnects and gets SESSION_EXPIRED  
  
 Where’s the problem? This is how everyone uses ZooKeeper. There is 0  
 chance of multiple writers in this scenario.  
  
  
  
 On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com) wrote:  
  
 Camille, I don't have a central message store/processor that can guarantee  
 single writer (if I had one, it would reduce (still useful in reducing lock  
 contention, etc) the need/value of using zookeeper) and hence I am trying  
 to  
 minimize the chances of multiple writers (more or less trying to guarantee  
 this) while maximizing availability (not trying to solve CAP theorem), by  
 solving some specific issues that affect availability.  
  
  
  
 --  
 View this message in context:  
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
   
 Sent from the zookeeper-user mailing list archive at Nabble.com.  
  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
OK - then that’s my recommendation. Anything else is unsafe for reasons 
described by others. That’s always been my recommendation with Curator or any 
use of ZooKeeper.

-Jordan



On July 15, 2015 at 2:16:03 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

Jordan, I mean the client gives up the lock and stops working on the shared  
resource. So when zookeeper is unavailable, no one is working on any shared  
resource (because they cannot distinguish network partition from zookeeper  
DEAD scenario).  



--  
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
  
Sent from the zookeeper-user mailing list archive at Nabble.com.  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Camille, I don't have a central message store/processor that can guarantee
single writer (if I had one, it would reduce (still useful in reducing lock
contention, etc) the need/value of using zookeeper) and hence I am trying to
minimize the chances of multiple writers (more or less trying to guarantee
this) while maximizing availability (not trying to solve CAP theorem), by
solving some specific issues that affect availability.



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
If client a does a full gc immediately before sending a message that is
long enough to lose the lock, it will send the message out of order. You
cannot guarantee exclusive access without verification at the locked
resource.

C
On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

 I don’t see how there’s a chance of multiple writers. Assuming a
 reasonable session timeout:

 * Client A gets the lock
 * Client B watches Client A’s lock node
 * Client A gets a network partition
 * Client A will get a SysDisconnected before the session times out
 * Client A must immediately assume it no longer has the lock
 * Client A’s session times out
 * Client A’s ephemeral node is deleted
 * Client B’s watch fires
 * Client B takes the lock
 * Client A reconnects and gets SESSION_EXPIRED

 Where’s the problem? This is how everyone uses ZooKeeper. There is 0
 chance of multiple writers in this scenario.



 On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

 Camille, I don't have a central message store/processor that can guarantee
 single writer (if I had one, it would reduce (still useful in reducing lock
 contention, etc) the need/value of using zookeeper) and hence I am trying
 to
 minimize the chances of multiple writers (more or less trying to guarantee
 this) while maximizing availability (not trying to solve CAP theorem), by
 solving some specific issues that affect availability.



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Jordan, I mean the client gives up the lock and stops working on the shared
resource. So when zookeeper is unavailable, no one is working on any shared
resource (because they cannot distinguish network partition from zookeeper
DEAD scenario).



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
I didn’t mean to imply you don’t. But, I really need to understand this as it 
changes many assumptions I have. Can anyone explain? If not, I’ll wait for your 
explanation. 



On July 15, 2015 at 2:23:33 PM, Camille Fournier (cami...@apache.org) wrote:

I'm in transit so I don't have time to explain this in detail but I promise  
you I know what I'm talking about so take some time to think through what  
I've said and work it out for yourself, it takes longer than five minutes  
to work through it.  

C  
Even if messages go out of order how does that change the algorithm? Can  
you explain? The session is maintained by means of a heartbeat. If a  
heartbeat is missed then the client goes to Disconnected. Message ordering  
in this case shouldn’t matter. Are you saying that message ordering can  
cause two clients to think they have the lowest sequence number?  

-Jordan  



On July 15, 2015 at 2:12:26 PM, Camille Fournier (skami...@gmail.com) wrote:  

I don't know what to tell you Jordan, but this is an observable phenomenon  
and it can happen. It's relatively unlikely and rare but not impossible. If  
you're interested in it I'd recommend reading the chubby paper out of  
Google, where they discuss it in some detail.  

C  
On Jul 15, 2015 3:09 PM, Jordan Zimmerman jor...@jordanzimmerman.com  
wrote:  

 He’s talking about multiple writers. Given a reasonable session timeout,  
 even a GC shouldn’t matter. If the GC causes a heartbeat miss the client  
 will get SysDisconnected.  
  
 -Jordan  
  
  
  
 On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami...@gmail.com)  
 wrote:  
  
 If client a does a full gc immediately before sending a message that is  
 long enough to lose the lock, it will send the message out of order. You  
 cannot guarantee exclusive access without verification at the locked  
 resource.  
  
 C  
 On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com  
 wrote:  
  
  I don’t see how there’s a chance of multiple writers. Assuming a  
  reasonable session timeout:  
   
  * Client A gets the lock  
  * Client B watches Client A’s lock node  
  * Client A gets a network partition  
  * Client A will get a SysDisconnected before the session times out  
  * Client A must immediately assume it no longer has the lock  
  * Client A’s session times out  
  * Client A’s ephemeral node is deleted  
  * Client B’s watch fires  
  * Client B takes the lock  
  * Client A reconnects and gets SESSION_EXPIRED  
   
  Where’s the problem? This is how everyone uses ZooKeeper. There is 0  
  chance of multiple writers in this scenario.  
   
   
   
  On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com)  
 wrote:  
   
  Camille, I don't have a central message store/processor that can  
 guarantee  
  single writer (if I had one, it would reduce (still useful in reducing  
 lock  
  contention, etc) the need/value of using zookeeper) and hence I am trying  
  to  
  minimize the chances of multiple writers (more or less trying to  
 guarantee  
  this) while maximizing availability (not trying to solve CAP theorem), by  
  solving some specific issues that affect availability.  
   
   
   
  --  
  View this message in context:  
   
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
   
  Sent from the zookeeper-user mailing list archive at Nabble.com.  
   
  
  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
Ivan, I just read the blog and I still don’t see how this can happen. Sorry if 
I’m being dense. I’d appreciate a discussion on this. In your blog you state: 
when ZooKeeper tells you that you are leader, there’s no guarantee that there 
isn’t another node that 'thinks' its the leader.” However, given a long enough 
session time — I usually recommend 30–60 seconds, I don’t see how this can 
happen. The client itself determines that there is a network partition when 
there is no heartbeat success. The heartbeat is a fraction of the session 
timeout. Once the heartbeat fails, the client must assume it no longer has the 
lock. Another client cannot take over the lock until, at minimum, session 
timeout. So, how then can there be two leaders?

-Jordan

On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:

I blogged about this exact problem a couple of weeks ago [1]. I give an  
example of how split brain can happen in a resource under a zk lock (Hbase  
in this case). As Camille says, sequence numbers ftw. I'll add that the  
data store has to support them though, which not all do (in fact I've yet  
to see one in the wild that does). I've implemented a prototype that works  
with hbase[2] if you want to see what it looks like.  

-Ivan  

[1]  
https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
  
[2] https://github.com/ivankelly/hbase-exclusive-writer  

On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com wrote:  

 Jordan, I mean the client gives up the lock and stops working on the shared  
 resource. So when zookeeper is unavailable, no one is working on any shared  
 resource (because they cannot distinguish network partition from zookeeper  
 DEAD scenario).  
  
  
  
 --  
 View this message in context:  
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   
 Sent from the zookeeper-user mailing list archive at Nabble.com.  
  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
I don't know what to tell you Jordan, but this is an observable phenomenon
and it can happen. It's relatively unlikely and rare but not impossible. If
you're interested in it I'd recommend reading the chubby paper out of
Google, where they discuss it in some detail.

C
On Jul 15, 2015 3:09 PM, Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

 He’s talking about multiple writers. Given a reasonable session timeout,
 even a GC shouldn’t matter. If the GC causes a heartbeat miss the client
 will get SysDisconnected.

 -Jordan



 On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami...@gmail.com)
 wrote:

 If client a does a full gc immediately before sending a message that is
 long enough to lose the lock, it will send the message out of order. You
 cannot guarantee exclusive access without verification at the locked
 resource.

 C
 On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com
 wrote:

  I don’t see how there’s a chance of multiple writers. Assuming a
  reasonable session timeout:
 
  * Client A gets the lock
  * Client B watches Client A’s lock node
  * Client A gets a network partition
  * Client A will get a SysDisconnected before the session times out
  * Client A must immediately assume it no longer has the lock
  * Client A’s session times out
  * Client A’s ephemeral node is deleted
  * Client B’s watch fires
  * Client B takes the lock
  * Client A reconnects and gets SESSION_EXPIRED
 
  Where’s the problem? This is how everyone uses ZooKeeper. There is 0
  chance of multiple writers in this scenario.
 
 
 
  On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com)
 wrote:
 
  Camille, I don't have a central message store/processor that can
 guarantee
  single writer (if I had one, it would reduce (still useful in reducing
 lock
  contention, etc) the need/value of using zookeeper) and hence I am
 trying
  to
  minimize the chances of multiple writers (more or less trying to
 guarantee
  this) while maximizing availability (not trying to solve CAP theorem),
 by
  solving some specific issues that affect availability.
 
 
 
  --
  View this message in context:
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
  Sent from the zookeeper-user mailing list archive at Nabble.com.
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
I'm in transit so I don't have time to explain this in detail but I promise
you I know what I'm talking about so take some time to think through what
I've said and work it out for yourself, it takes longer than five minutes
to work through it.

C
Even if messages go out of order how does that change the algorithm? Can
you explain? The session is maintained by means of a heartbeat. If a
heartbeat is missed then the client goes to Disconnected. Message ordering
in this case shouldn’t matter. Are you saying that message ordering can
cause two clients to think they have the lowest sequence number?

-Jordan



On July 15, 2015 at 2:12:26 PM, Camille Fournier (skami...@gmail.com) wrote:

I don't know what to tell you Jordan, but this is an observable phenomenon
and it can happen. It's relatively unlikely and rare but not impossible. If
you're interested in it I'd recommend reading the chubby paper out of
Google, where they discuss it in some detail.

C
On Jul 15, 2015 3:09 PM, Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

  He’s talking about multiple writers. Given a reasonable session timeout,
 even a GC shouldn’t matter. If the GC causes a heartbeat miss the client
 will get SysDisconnected.

  -Jordan



 On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami...@gmail.com)
 wrote:

  If client a does a full gc immediately before sending a message that is
 long enough to lose the lock, it will send the message out of order. You
 cannot guarantee exclusive access without verification at the locked
 resource.

 C
 On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com
 wrote:

  I don’t see how there’s a chance of multiple writers. Assuming a
  reasonable session timeout:
 
  * Client A gets the lock
  * Client B watches Client A’s lock node
  * Client A gets a network partition
  * Client A will get a SysDisconnected before the session times out
  * Client A must immediately assume it no longer has the lock
  * Client A’s session times out
  * Client A’s ephemeral node is deleted
  * Client B’s watch fires
  * Client B takes the lock
  * Client A reconnects and gets SESSION_EXPIRED
 
  Where’s the problem? This is how everyone uses ZooKeeper. There is 0
  chance of multiple writers in this scenario.
 
 
 
  On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com)
 wrote:
 
  Camille, I don't have a central message store/processor that can
 guarantee
  single writer (if I had one, it would reduce (still useful in reducing
 lock
  contention, etc) the need/value of using zookeeper) and hence I am trying
  to
  minimize the chances of multiple writers (more or less trying to
 guarantee
  this) while maximizing availability (not trying to solve CAP theorem), by
  solving some specific issues that affect availability.
 
 
 
  --
  View this message in context:
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
  Sent from the zookeeper-user mailing list archive at Nabble.com.
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Ivan Kelly
I blogged about this exact problem a couple of weeks ago [1]. I give an
example of how split brain can happen in a resource under a zk lock (Hbase
in this case). As Camille says, sequence numbers ftw. I'll add that the
data store has to support them though, which not all do (in fact I've yet
to see one in the wild that does). I've implemented a prototype that works
with hbase[2] if you want to see what it looks like.

-Ivan

[1]
https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
[2] https://github.com/ivankelly/hbase-exclusive-writer

On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com wrote:

 Jordan, I mean the client gives up the lock and stops working on the shared
 resource. So when zookeeper is unavailable, no one is working on any shared
 resource (because they cannot distinguish network partition from zookeeper
 DEAD scenario).



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
I don’t see how there’s a chance of multiple writers. Assuming a reasonable 
session timeout:

* Client A gets the lock
* Client B watches Client A’s lock node
* Client A gets a network partition
* Client A will get a SysDisconnected before the session times out
* Client A must immediately assume it no longer has the lock
* Client A’s session times out
* Client A’s ephemeral node is deleted
* Client B’s watch fires
* Client B takes the lock
* Client A reconnects and gets SESSION_EXPIRED

Where’s the problem? This is how everyone uses ZooKeeper. There is 0 chance of 
multiple writers in this scenario.



On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

Camille, I don't have a central message store/processor that can guarantee  
single writer (if I had one, it would reduce (still useful in reducing lock  
contention, etc) the need/value of using zookeeper) and hence I am trying to  
minimize the chances of multiple writers (more or less trying to guarantee  
this) while maximizing availability (not trying to solve CAP theorem), by  
solving some specific issues that affect availability.  



--  
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
  
Sent from the zookeeper-user mailing list archive at Nabble.com.  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Jordan, We are using it like you describe and don't have a multi-writer
problem. Issue is:

* Zookeeper ensemble goes into leader election
* All clients see session transition to 'CONNECTING' state (SysDisconnected
in your sequence)
* All clients give up all the locks they owned on the resources they were
working with for a GLOBAL outage of the application.



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581287.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Alexander Shraer
+1 to what Camille is saying  suggestion to use generations

On Wed, Jul 15, 2015 at 12:04 PM, Camille Fournier skami...@gmail.com
wrote:

 If client a does a full gc immediately before sending a message that is
 long enough to lose the lock, it will send the message out of order. You
 cannot guarantee exclusive access without verification at the locked
 resource.

 C
 On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com
 wrote:

  I don’t see how there’s a chance of multiple writers. Assuming a
  reasonable session timeout:
 
  * Client A gets the lock
  * Client B watches Client A’s lock node
  * Client A gets a network partition
  * Client A will get a SysDisconnected before the session times out
  * Client A must immediately assume it no longer has the lock
  * Client A’s session times out
  * Client A’s ephemeral node is deleted
  * Client B’s watch fires
  * Client B takes the lock
  * Client A reconnects and gets SESSION_EXPIRED
 
  Where’s the problem? This is how everyone uses ZooKeeper. There is 0
  chance of multiple writers in this scenario.
 
 
 
  On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com)
 wrote:
 
  Camille, I don't have a central message store/processor that can
 guarantee
  single writer (if I had one, it would reduce (still useful in reducing
 lock
  contention, etc) the need/value of using zookeeper) and hence I am trying
  to
  minimize the chances of multiple writers (more or less trying to
 guarantee
  this) while maximizing availability (not trying to solve CAP theorem), by
  solving some specific issues that affect availability.
 
 
 
  --
  View this message in context:
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
  Sent from the zookeeper-user mailing list archive at Nabble.com.
 



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Camille, Agreed, but fixing application level bugs is a different set of
problems.



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581289.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
Even if messages go out of order how does that change the algorithm? Can you 
explain? The session is maintained by means of a heartbeat. If a heartbeat is 
missed then the client goes to Disconnected. Message ordering in this case 
shouldn’t matter. Are you saying that message ordering can cause two clients to 
think they have the lowest sequence number?

-Jordan



On July 15, 2015 at 2:12:26 PM, Camille Fournier (skami...@gmail.com) wrote:

I don't know what to tell you Jordan, but this is an observable phenomenon and 
it can happen. It's relatively unlikely and rare but not impossible. If you're 
interested in it I'd recommend reading the chubby paper out of Google, where 
they discuss it in some detail.

C

On Jul 15, 2015 3:09 PM, Jordan Zimmerman jor...@jordanzimmerman.com wrote:
He’s talking about multiple writers. Given a reasonable session timeout, even a 
GC shouldn’t matter. If the GC causes a heartbeat miss the client will get 
SysDisconnected.

-Jordan



On July 15, 2015 at 2:05:41 PM, Camille Fournier (skami...@gmail.com) wrote:

If client a does a full gc immediately before sending a message that is
long enough to lose the lock, it will send the message out of order. You
cannot guarantee exclusive access without verification at the locked
resource.

C
On Jul 15, 2015 3:02 PM, Jordan Zimmerman jor...@jordanzimmerman.com
wrote:

 I don’t see how there’s a chance of multiple writers. Assuming a
 reasonable session timeout:

 * Client A gets the lock
 * Client B watches Client A’s lock node
 * Client A gets a network partition
 * Client A will get a SysDisconnected before the session times out
 * Client A must immediately assume it no longer has the lock
 * Client A’s session times out
 * Client A’s ephemeral node is deleted
 * Client B’s watch fires
 * Client B takes the lock
 * Client A reconnects and gets SESSION_EXPIRED

 Where’s the problem? This is how everyone uses ZooKeeper. There is 0
 chance of multiple writers in this scenario.



 On July 15, 2015 at 1:56:37 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

 Camille, I don't have a central message store/processor that can guarantee
 single writer (if I had one, it would reduce (still useful in reducing lock
 contention, etc) the need/value of using zookeeper) and hence I am trying
 to
 minimize the chances of multiple writers (more or less trying to guarantee
 this) while maximizing availability (not trying to solve CAP theorem), by
 solving some specific issues that affect availability.



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581284.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
Ivan, Thanks for the references. Great write up describing the problem and
the solution. I agree that for total ordering storage layer should provide
the guarantee.

However, in my scenario, I don't have a global/central storage system
(trying to build a globally (servers in different continents) distributed
storage system), I am minimizing the chances of multiple writers with longer
timeouts and eliminating the downtime during planned maintenance, but don't
have a way to deal with zookeeper ensemble being DEAD (or in leader
election), which wipes out the entire application.

Currently, we twiddle our thumbs when zookeeper or network has issues and
would like to find a way to improve the application availability during
zookeeper service interruptions.

Using multiple ensembles seems like a promising path, but I wanted to see if
anyone has thought about extending the error handling between zookeeper
client/server to deal with this issue.



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581299.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Camille Fournier
I thought that the client itself had a notion of the session timeout
internally that would conservatively let the client know that it was dead?
If not, then that's my faulty memory.

That being said, if you really care about the client not sending messages
when it does not have the lock, the resource under contention needs to
validate the messages it is receiving, though. You cannot guarantee that
just because a client believes it is connected and sends a message to
locked resource that the message will be received while the sender still
has the lock. If you don't care about this possibility then just assuming
you lose the lock when you are in any state other than connected is
adequate but just be aware that events such as long GC pauses and network
issues can cause you to access the resource improperly.

C

On Wed, Jul 15, 2015 at 2:19 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 Once client A loses connection it must assume that it no longer has the
 lock (you could try to time the session but I think that’s a bad idea).
 Once you reconnect, you will know if your session is still active or not.
 When done correctly, there’s no chance that both A and B will think they
 own the lock at the same time.

 -Jordan



 On July 15, 2015 at 1:17:10 PM, Vikas Mehta (vikasme...@gmail.com) wrote:

 Thanks for the quick response Camille. If client A owns the lock, gets
 disconnected due to network partition, it will not see the SESSION_EXPIRED
 event until it is too late, i.e. client B has acquired the lock and done
 the
 damage. Problem here is that client cannot distinguish network partition
 from zookeeper ensemble in leader election state.



 --
 View this message in context:
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581279.html
 Sent from the zookeeper-user mailing list archive at Nabble.com.



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Alexander Shraer
This property may hold if you make a lot of timing/synchrony assumptions --
agreeing on who holds the lock in an asynchronous distributed system with
failures is impossible, this is the FLP impossibility.

But even if it holds, this property is not very useful if the ZK client
itself doesn't have the application data. So one has to consider whether it
is possible that the application sees a messages from two clients that both
think are the leader in an order which contradicts the lock acquisition
order.

On Wed, Jul 15, 2015 at 1:26 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 I think we may be talking past each other here. My contention (and the ZK
 docs agree BTW) is that, properly written and configured, at any
 snapshot in time no two clients think they hold the same lock”. How your
 application acts on that fact is another thing. You might need sequence
 numbers, you might not.

 -Jordan


 On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)
 wrote:

 Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
 link
 
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf


 it suggests 2 ways in which the storage can support lock generations and
 proposes an alternative for the case where the storage can't be made aware
 of lock generations.

 On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  Ivan, I just read the blog and I still don’t see how this can happen.
  Sorry if I’m being dense. I’d appreciate a discussion on this. In your
 blog
  you state: when ZooKeeper tells you that you are leader, there’s no
  guarantee that there isn’t another node that 'thinks' its the leader.”
  However, given a long enough session time — I usually recommend 30–60
  seconds, I don’t see how this can happen. The client itself determines
 that
  there is a network partition when there is no heartbeat success. The
  heartbeat is a fraction of the session timeout. Once the heartbeat
 fails,
  the client must assume it no longer has the lock. Another client cannot
  take over the lock until, at minimum, session timeout. So, how then can
  there be two leaders?
 
  -Jordan
 
  On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:
 
  I blogged about this exact problem a couple of weeks ago [1]. I give an
  example of how split brain can happen in a resource under a zk lock
 (Hbase
  in this case). As Camille says, sequence numbers ftw. I'll add that the
  data store has to support them though, which not all do (in fact I've
 yet
  to see one in the wild that does). I've implemented a prototype that
 works
  with hbase[2] if you want to see what it looks like.
 
  -Ivan
 
  [1]
 
 
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
  [2] https://github.com/ivankelly/hbase-exclusive-writer
 
  On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com
 wrote:
 
   Jordan, I mean the client gives up the lock and stops working on the
  shared
   resource. So when zookeeper is unavailable, no one is working on any
  shared
   resource (because they cannot distinguish network partition from
  zookeeper
   DEAD scenario).
  
  
  
   --
   View this message in context:
  
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   Sent from the zookeeper-user mailing list archive at Nabble.com.
  
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
This property may hold if you make a lot of timing/synchrony assumptions
These assumptions and timing are intrinsic to using ZooKeeper. So, of course 
I’m making these assumptions.


-Jordan



On July 15, 2015 at 3:57:12 PM, Alexander Shraer (shra...@gmail.com) wrote:

This property may hold if you make a lot of timing/synchrony assumptions -- 
agreeing on who holds the lock in an asynchronous distributed system with 
failures is impossible, this is the FLP impossibility. 

But even if it holds, this property is not very useful if the ZK client itself 
doesn't have the application data. So one has to consider whether it is 
possible that the application sees a messages from two clients that both think 
are the leader in an order which contradicts the lock acquisition order.

On Wed, Jul 15, 2015 at 1:26 PM, Jordan Zimmerman jor...@jordanzimmerman.com 
wrote:
I think we may be talking past each other here. My contention (and the ZK docs 
agree BTW) is that, properly written and configured, at any snapshot in time 
no two clients think they hold the same lock”. How your application acts on 
that fact is another thing. You might need sequence numbers, you might not. 

-Jordan


On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com) wrote:

Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
link
http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

it suggests 2 ways in which the storage can support lock generations and
proposes an alternative for the case where the storage can't be made aware
of lock generations.

On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 Ivan, I just read the blog and I still don’t see how this can happen.
 Sorry if I’m being dense. I’d appreciate a discussion on this. In your blog
 you state: when ZooKeeper tells you that you are leader, there’s no
 guarantee that there isn’t another node that 'thinks' its the leader.”
 However, given a long enough session time — I usually recommend 30–60
 seconds, I don’t see how this can happen. The client itself determines that
 there is a network partition when there is no heartbeat success. The
 heartbeat is a fraction of the session timeout. Once the heartbeat fails,
 the client must assume it no longer has the lock. Another client cannot
 take over the lock until, at minimum, session timeout. So, how then can
 there be two leaders?

 -Jordan

 On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:

 I blogged about this exact problem a couple of weeks ago [1]. I give an
 example of how split brain can happen in a resource under a zk lock (Hbase
 in this case). As Camille says, sequence numbers ftw. I'll add that the
 data store has to support them though, which not all do (in fact I've yet
 to see one in the wild that does). I've implemented a prototype that works
 with hbase[2] if you want to see what it looks like.

 -Ivan

 [1]

 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
 [2] https://github.com/ivankelly/hbase-exclusive-writer

 On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com wrote:

  Jordan, I mean the client gives up the lock and stops working on the
 shared
  resource. So when zookeeper is unavailable, no one is working on any
 shared
  resource (because they cannot distinguish network partition from
 zookeeper
  DEAD scenario).
 
 
 
  --
  View this message in context:
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
  Sent from the zookeeper-user mailing list archive at Nabble.com.
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Ivan Kelly
 at any snapshot in time no two clients think they hold the same lock”
According to the ZK service. But communication between the service and
client takes time.

-Ivan

On Wed, Jul 15, 2015 at 10:54 PM Ivan Kelly iv...@apache.org wrote:

 Jordan, imagine you have a node which is leader using the hbase example. A
 client makes some request to the leader, which processes the request, lines
 up a write to the state in hbase, and promptly goes into a 30 second gc
 pause just before it flushes the socket. During the 30 second pause another
 node takes over as leader and starts writing to the state. Now, when the
 pause ends, what will stop the write from the first leader being flushed to
 the socket and then hitting hbase?

 -Ivan

 On Wed, Jul 15, 2015 at 10:26 PM Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

 I think we may be talking past each other here. My contention (and the ZK
 docs agree BTW) is that, properly written and configured, at any
 snapshot in time no two clients think they hold the same lock”. How your
 application acts on that fact is another thing. You might need sequence
 numbers, you might not.

 -Jordan


 On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)
 wrote:

 Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
 link
 
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf


 it suggests 2 ways in which the storage can support lock generations and
 proposes an alternative for the case where the storage can't be made
 aware
 of lock generations.

 On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  Ivan, I just read the blog and I still don’t see how this can happen.
  Sorry if I’m being dense. I’d appreciate a discussion on this. In your
 blog
  you state: when ZooKeeper tells you that you are leader, there’s no
  guarantee that there isn’t another node that 'thinks' its the leader.”
  However, given a long enough session time — I usually recommend 30–60
  seconds, I don’t see how this can happen. The client itself determines
 that
  there is a network partition when there is no heartbeat success. The
  heartbeat is a fraction of the session timeout. Once the heartbeat
 fails,
  the client must assume it no longer has the lock. Another client cannot
  take over the lock until, at minimum, session timeout. So, how then can
  there be two leaders?
 
  -Jordan
 
  On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:
 
  I blogged about this exact problem a couple of weeks ago [1]. I give an
  example of how split brain can happen in a resource under a zk lock
 (Hbase
  in this case). As Camille says, sequence numbers ftw. I'll add that the
  data store has to support them though, which not all do (in fact I've
 yet
  to see one in the wild that does). I've implemented a prototype that
 works
  with hbase[2] if you want to see what it looks like.
 
  -Ivan
 
  [1]
 
 
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
  [2] https://github.com/ivankelly/hbase-exclusive-writer
 
  On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com
 wrote:
 
   Jordan, I mean the client gives up the lock and stops working on the
  shared
   resource. So when zookeeper is unavailable, no one is working on any
  shared
   resource (because they cannot distinguish network partition from
  zookeeper
   DEAD scenario).
  
  
  
   --
   View this message in context:
  
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   Sent from the zookeeper-user mailing list archive at Nabble.com.
  
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
It doesn’t matter that communication between the service the client takes time. 
The client can determine that it is no longer the lock holder independently of 
the server.

-Jordan


On July 15, 2015 at 3:55:40 PM, Ivan Kelly (iv...@apache.org) wrote:

 at any snapshot in time no two clients think they hold the same lock”  
According to the ZK service. But communication between the service and  
client takes time.  

-Ivan  

On Wed, Jul 15, 2015 at 10:54 PM Ivan Kelly iv...@apache.org wrote:  

 Jordan, imagine you have a node which is leader using the hbase example. A  
 client makes some request to the leader, which processes the request, lines  
 up a write to the state in hbase, and promptly goes into a 30 second gc  
 pause just before it flushes the socket. During the 30 second pause another  
 node takes over as leader and starts writing to the state. Now, when the  
 pause ends, what will stop the write from the first leader being flushed to  
 the socket and then hitting hbase?  
  
 -Ivan  
  
 On Wed, Jul 15, 2015 at 10:26 PM Jordan Zimmerman   
 jor...@jordanzimmerman.com wrote:  
  
 I think we may be talking past each other here. My contention (and the ZK  
 docs agree BTW) is that, properly written and configured, at any  
 snapshot in time no two clients think they hold the same lock”. How your  
 application acts on that fact is another thing. You might need sequence  
 numbers, you might not.  
  
 -Jordan  
  
  
 On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)  
 wrote:  
  
 Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:  
 link  
   
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf
   
  
  
 it suggests 2 ways in which the storage can support lock generations and  
 proposes an alternative for the case where the storage can't be made  
 aware  
 of lock generations.  
  
 On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman   
 jor...@jordanzimmerman.com wrote:  
  
  Ivan, I just read the blog and I still don’t see how this can happen.  
  Sorry if I’m being dense. I’d appreciate a discussion on this. In your  
 blog  
  you state: when ZooKeeper tells you that you are leader, there’s no  
  guarantee that there isn’t another node that 'thinks' its the leader.”  
  However, given a long enough session time — I usually recommend 30–60  
  seconds, I don’t see how this can happen. The client itself determines  
 that  
  there is a network partition when there is no heartbeat success. The  
  heartbeat is a fraction of the session timeout. Once the heartbeat  
 fails,  
  the client must assume it no longer has the lock. Another client cannot  
  take over the lock until, at minimum, session timeout. So, how then can  
  there be two leaders?  
   
  -Jordan  
   
  On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:  
   
  I blogged about this exact problem a couple of weeks ago [1]. I give an  
  example of how split brain can happen in a resource under a zk lock  
 (Hbase  
  in this case). As Camille says, sequence numbers ftw. I'll add that the  
  data store has to support them though, which not all do (in fact I've  
 yet  
  to see one in the wild that does). I've implemented a prototype that  
 works  
  with hbase[2] if you want to see what it looks like.  
   
  -Ivan  
   
  [1]  
   
   
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
   
  [2] https://github.com/ivankelly/hbase-exclusive-writer  
   
  On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com  
 wrote:  
   
   Jordan, I mean the client gives up the lock and stops working on the  
  shared  
   resource. So when zookeeper is unavailable, no one is working on any  
  shared  
   resource (because they cannot distinguish network partition from  
  zookeeper  
   DEAD scenario).  



   --  
   View this message in context:  

   
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   
   Sent from the zookeeper-user mailing list archive at Nabble.com.  

   
  
  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Alexander Shraer
I disagree, ZooKeeper itself actually doesn't rely on timing for safety -
it won't get into an inconsistent state even if all timing assumptions fail
(except for the sync operation, which is then not guaranteed to return the
latest value, but that's a known issue that needs to be fixed).




On Wed, Jul 15, 2015 at 2:13 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 This property may hold if you make a lot of timing/synchrony assumptions

 These assumptions and timing are intrinsic to using ZooKeeper. So, of
 course I’m making these assumptions.

 -Jordan



 On July 15, 2015 at 3:57:12 PM, Alexander Shraer (shra...@gmail.com)
 wrote:

 This property may hold if you make a lot of timing/synchrony assumptions
 -- agreeing on who holds the lock in an asynchronous distributed system
 with failures is impossible, this is the FLP impossibility.

 But even if it holds, this property is not very useful if the ZK client
 itself doesn't have the application data. So one has to consider whether it
 is possible that the application sees a messages from two clients that both
 think are the leader in an order which contradicts the lock acquisition
 order.

 On Wed, Jul 15, 2015 at 1:26 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  I think we may be talking past each other here. My contention (and the
 ZK docs agree BTW) is that, properly written and configured, at any
 snapshot in time no two clients think they hold the same lock”. How your
 application acts on that fact is another thing. You might need sequence
 numbers, you might not.

 -Jordan


 On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)
 wrote:

  Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
 link
 
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf
 

 it suggests 2 ways in which the storage can support lock generations and
 proposes an alternative for the case where the storage can't be made aware
 of lock generations.

 On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  Ivan, I just read the blog and I still don’t see how this can happen.
  Sorry if I’m being dense. I’d appreciate a discussion on this. In your
 blog
  you state: when ZooKeeper tells you that you are leader, there’s no
  guarantee that there isn’t another node that 'thinks' its the leader.”
  However, given a long enough session time — I usually recommend 30–60
  seconds, I don’t see how this can happen. The client itself determines
 that
  there is a network partition when there is no heartbeat success. The
  heartbeat is a fraction of the session timeout. Once the heartbeat
 fails,
  the client must assume it no longer has the lock. Another client cannot
  take over the lock until, at minimum, session timeout. So, how then can
  there be two leaders?
 
  -Jordan
 
  On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:
 
  I blogged about this exact problem a couple of weeks ago [1]. I give an
  example of how split brain can happen in a resource under a zk lock
 (Hbase
  in this case). As Camille says, sequence numbers ftw. I'll add that the
  data store has to support them though, which not all do (in fact I've
 yet
  to see one in the wild that does). I've implemented a prototype that
 works
  with hbase[2] if you want to see what it looks like.
 
  -Ivan
 
  [1]
 
 
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
  [2] https://github.com/ivankelly/hbase-exclusive-writer
 
  On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com
 wrote:
 
   Jordan, I mean the client gives up the lock and stops working on the
  shared
   resource. So when zookeeper is unavailable, no one is working on any
  shared
   resource (because they cannot distinguish network partition from
  zookeeper
   DEAD scenario).
  
  
  
   --
   View this message in context:
  
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   Sent from the zookeeper-user mailing list archive at Nabble.com.
  
 





Re: locking/leader election and dealing with session loss

2015-07-15 Thread Alexander Shraer
Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
link
http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf

it suggests 2 ways in which the storage can support lock generations and
proposes an alternative for the case where the storage can't be made aware
of lock generations.

On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 Ivan, I just read the blog and I still don’t see how this can happen.
 Sorry if I’m being dense. I’d appreciate a discussion on this. In your blog
 you state: when ZooKeeper tells you that you are leader, there’s no
 guarantee that there isn’t another node that 'thinks' its the leader.”
 However, given a long enough session time — I usually recommend 30–60
 seconds, I don’t see how this can happen. The client itself determines that
 there is a network partition when there is no heartbeat success. The
 heartbeat is a fraction of the session timeout. Once the heartbeat fails,
 the client must assume it no longer has the lock. Another client cannot
 take over the lock until, at minimum, session timeout. So, how then can
 there be two leaders?

 -Jordan

 On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:

 I blogged about this exact problem a couple of weeks ago [1]. I give an
 example of how split brain can happen in a resource under a zk lock (Hbase
 in this case). As Camille says, sequence numbers ftw. I'll add that the
 data store has to support them though, which not all do (in fact I've yet
 to see one in the wild that does). I've implemented a prototype that works
 with hbase[2] if you want to see what it looks like.

 -Ivan

 [1]

 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
 [2] https://github.com/ivankelly/hbase-exclusive-writer

 On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com wrote:

  Jordan, I mean the client gives up the lock and stops working on the
 shared
  resource. So when zookeeper is unavailable, no one is working on any
 shared
  resource (because they cannot distinguish network partition from
 zookeeper
  DEAD scenario).
 
 
 
  --
  View this message in context:
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
  Sent from the zookeeper-user mailing list archive at Nabble.com.
 



Re: locking/leader election and dealing with session loss

2015-07-15 Thread Jordan Zimmerman
I think we may be talking past each other here. My contention (and the ZK docs 
agree BTW) is that, properly written and configured, at any snapshot in time 
no two clients think they hold the same lock”. How your application acts on 
that fact is another thing. You might need sequence numbers, you might not. 

-Jordan


On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com) wrote:

Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:  
link  
http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf
  

it suggests 2 ways in which the storage can support lock generations and  
proposes an alternative for the case where the storage can't be made aware  
of lock generations.  

On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman   
jor...@jordanzimmerman.com wrote:  

 Ivan, I just read the blog and I still don’t see how this can happen.  
 Sorry if I’m being dense. I’d appreciate a discussion on this. In your blog  
 you state: when ZooKeeper tells you that you are leader, there’s no  
 guarantee that there isn’t another node that 'thinks' its the leader.”  
 However, given a long enough session time — I usually recommend 30–60  
 seconds, I don’t see how this can happen. The client itself determines that  
 there is a network partition when there is no heartbeat success. The  
 heartbeat is a fraction of the session timeout. Once the heartbeat fails,  
 the client must assume it no longer has the lock. Another client cannot  
 take over the lock until, at minimum, session timeout. So, how then can  
 there be two leaders?  
  
 -Jordan  
  
 On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:  
  
 I blogged about this exact problem a couple of weeks ago [1]. I give an  
 example of how split brain can happen in a resource under a zk lock (Hbase  
 in this case). As Camille says, sequence numbers ftw. I'll add that the  
 data store has to support them though, which not all do (in fact I've yet  
 to see one in the wild that does). I've implemented a prototype that works  
 with hbase[2] if you want to see what it looks like.  
  
 -Ivan  
  
 [1]  
  
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
   
 [2] https://github.com/ivankelly/hbase-exclusive-writer  
  
 On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com wrote:  
  
  Jordan, I mean the client gives up the lock and stops working on the  
 shared  
  resource. So when zookeeper is unavailable, no one is working on any  
 shared  
  resource (because they cannot distinguish network partition from  
 zookeeper  
  DEAD scenario).  
   
   
   
  --  
  View this message in context:  
   
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   
  Sent from the zookeeper-user mailing list archive at Nabble.com.  
   
  


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Vikas Mehta
To expand a bit on the zookeeper client/server handshake part, one possible
option is to improve the situation for planned maintenance, i.e. before
leader is taken down (it knows it is going down), it tells the clients to
continue until zookeeper service is restored (there are a lot of gotchas and
edge conditions to think about here and I being a zookeeper noob haven't
been able to prove whether this will work or not).



--
View this message in context: 
http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581302.html
Sent from the zookeeper-user mailing list archive at Nabble.com.


Re: locking/leader election and dealing with session loss

2015-07-15 Thread Ivan Kelly
Jordan, imagine you have a node which is leader using the hbase example. A
client makes some request to the leader, which processes the request, lines
up a write to the state in hbase, and promptly goes into a 30 second gc
pause just before it flushes the socket. During the 30 second pause another
node takes over as leader and starts writing to the state. Now, when the
pause ends, what will stop the write from the first leader being flushed to
the socket and then hitting hbase?

-Ivan

On Wed, Jul 15, 2015 at 10:26 PM Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 I think we may be talking past each other here. My contention (and the ZK
 docs agree BTW) is that, properly written and configured, at any
 snapshot in time no two clients think they hold the same lock”. How your
 application acts on that fact is another thing. You might need sequence
 numbers, you might not.

 -Jordan


 On July 15, 2015 at 3:15:16 PM, Alexander Shraer (shra...@gmail.com)
 wrote:

 Jordan, as Camille suggested, please read Sec 2.4 in the Chubby paper:
 link
 
 http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf


 it suggests 2 ways in which the storage can support lock generations and
 proposes an alternative for the case where the storage can't be made aware
 of lock generations.

 On Wed, Jul 15, 2015 at 1:08 PM, Jordan Zimmerman 
 jor...@jordanzimmerman.com wrote:

  Ivan, I just read the blog and I still don’t see how this can happen.
  Sorry if I’m being dense. I’d appreciate a discussion on this. In your
 blog
  you state: when ZooKeeper tells you that you are leader, there’s no
  guarantee that there isn’t another node that 'thinks' its the leader.”
  However, given a long enough session time — I usually recommend 30–60
  seconds, I don’t see how this can happen. The client itself determines
 that
  there is a network partition when there is no heartbeat success. The
  heartbeat is a fraction of the session timeout. Once the heartbeat
 fails,
  the client must assume it no longer has the lock. Another client cannot
  take over the lock until, at minimum, session timeout. So, how then can
  there be two leaders?
 
  -Jordan
 
  On July 15, 2015 at 2:23:12 PM, Ivan Kelly (iv...@apache.org) wrote:
 
  I blogged about this exact problem a couple of weeks ago [1]. I give an
  example of how split brain can happen in a resource under a zk lock
 (Hbase
  in this case). As Camille says, sequence numbers ftw. I'll add that the
  data store has to support them though, which not all do (in fact I've
 yet
  to see one in the wild that does). I've implemented a prototype that
 works
  with hbase[2] if you want to see what it looks like.
 
  -Ivan
 
  [1]
 
 
 https://medium.com/@ivankelly/reliable-table-writer-locks-for-hbase-731024295215
  [2] https://github.com/ivankelly/hbase-exclusive-writer
 
  On Wed, Jul 15, 2015 at 9:16 PM Vikas Mehta vikasme...@gmail.com
 wrote:
 
   Jordan, I mean the client gives up the lock and stops working on the
  shared
   resource. So when zookeeper is unavailable, no one is working on any
  shared
   resource (because they cannot distinguish network partition from
  zookeeper
   DEAD scenario).
  
  
  
   --
   View this message in context:
  
 
 http://zookeeper-user.578899.n2.nabble.com/locking-leader-election-and-dealing-with-session-loss-tp7581277p7581293.html
   Sent from the zookeeper-user mailing list archive at Nabble.com.
  
 




Re: locking/leader election and dealing with session loss

2015-07-15 Thread Ivan Kelly
On Wed, Jul 15, 2015 at 11:14 PM Jordan Zimmerman 
jor...@jordanzimmerman.com wrote:

 This property may hold if you make a lot of timing/synchrony assumptions

 These assumptions and timing are intrinsic to using ZooKeeper. So, of
 course I’m making these assumptions.

Zookeeper does not rely on timing for correctness. Sure, it uses timeouts
to make sure it can make progress, but these can be arbitrary.

-Ivan