On 11 July 2014 10:31, Miller, Austin <[email protected]>
wrote:

> Hello,
>
> I'm looking for a way to get access to ClientCnxnSocket.lastSend without
> trying to break ZooKeeperEncapsulation.  Broadly, my goal is to use this in
> a Java process in order to increase confidence in transactions.
>
> There are resource starved situations where the ClientCnxn.SendThread may
> not be scheduled for greater than negotiatedSessionTimeout.  My
> understanding is this will lead to session loss because even though the
> connection might still be considered alive (the OS could be sending ACKs to
> packets ZK sends), the ZK server requires the client to be sending the
> packets.
>

Pings, from an idle client, actually need to go out every 1/3 of
negotiatedSessionTimeout.

>
> So, assuming
> ...a JVM process was connected to a ZK ensemble
> ...the JVM process is performing transactions
> ...ZK is being used for distributed locking with coarse granularity
> ...a reliable low-latency network connection to a healthy low-latency
> ensemble
> ...a rare event causes the machine hosting the JVM to be resource starved
> ...none of the JVM threads are scheduled for a window twice the length of
> the negotiatedSessionTimeout
>

1/3 of negotiatedSessionTimeout will already cause a ConnectionLoss...


> ...during this window, the process has lost the coarse lock on the
> ensemble (it was an ephemeral node)
>
> Then the ensemble should have agreed that the session is dead, correct?
>  Even though the connection may be considered alive at a TCP/IP transport
> level.


No, the ZK server *will* RST the client of if it hasn't ping in 1/3 of
negotiatedSessionTimeout.


>  What is more, just coming out of the state where the threads are
> scheduled, there is a race condition between the ZK threads firing session
> death event and the transaction threads committing transactions.  As I
> write this, I realize I'm not entirely sure what events ZK would send and
> in what order, it depending on what was done before the freeze and where it
> was frozen.
>
> Back to the broad goal, I want to increase confidence in this situation
> that the process still owns the ZK lock without firing off network events
> before committing every transaction.  Obviously, fine granular locks would
> solve this problem, but that comes with an unacceptable performance trade
> off.
>
> Now, let's say I could do something like "long
> org.apache.zookeeper.ZooKeeper.getLastSent()".  Well, I don't know if the
> ZK server actually received the packet, assuming it did receive the packet
> I don't know when it received the packet, and I don't know when the OS
> received the ack.  However, it does assert that the SendThread was
> scheduled and able to call System.getNanos() in ClientCnxnSocket.  This
> increasing the likelihood that the process was sending heartbeats.  In
> addition to this, if I haven't received a push notification from the ZK
> event thread implying I've lost the lock, I have higher confidence that the
> session hasn't been lost and that I still have the coarse lock, which
> satisfies my broad goal somewhat better than the current state.
>

You could just release the lock as soon as you receive ConnectionLoss
(i.e.: without waiting for SessionExpired, which you'll only get upon
reconnecting to a ZK server.. which could take longer, given a partition or
loaded network). But the case you are exposing is conflated with the
pathological scenario of a JVM instance starving it's threads... if that's
a risk, you might as well have an external health-check process that kills
your JVM entirely once  it's likely that the ZK thread might be starving
(hence, losing your lock being more likely).


-rgs

Reply via email to