On 11 July 2014 10:31, Miller, Austin <[email protected]> wrote:
> Hello, > > I'm looking for a way to get access to ClientCnxnSocket.lastSend without > trying to break ZooKeeperEncapsulation. Broadly, my goal is to use this in > a Java process in order to increase confidence in transactions. > > There are resource starved situations where the ClientCnxn.SendThread may > not be scheduled for greater than negotiatedSessionTimeout. My > understanding is this will lead to session loss because even though the > connection might still be considered alive (the OS could be sending ACKs to > packets ZK sends), the ZK server requires the client to be sending the > packets. > Pings, from an idle client, actually need to go out every 1/3 of negotiatedSessionTimeout. > > So, assuming > ...a JVM process was connected to a ZK ensemble > ...the JVM process is performing transactions > ...ZK is being used for distributed locking with coarse granularity > ...a reliable low-latency network connection to a healthy low-latency > ensemble > ...a rare event causes the machine hosting the JVM to be resource starved > ...none of the JVM threads are scheduled for a window twice the length of > the negotiatedSessionTimeout > 1/3 of negotiatedSessionTimeout will already cause a ConnectionLoss... > ...during this window, the process has lost the coarse lock on the > ensemble (it was an ephemeral node) > > Then the ensemble should have agreed that the session is dead, correct? > Even though the connection may be considered alive at a TCP/IP transport > level. No, the ZK server *will* RST the client of if it hasn't ping in 1/3 of negotiatedSessionTimeout. > What is more, just coming out of the state where the threads are > scheduled, there is a race condition between the ZK threads firing session > death event and the transaction threads committing transactions. As I > write this, I realize I'm not entirely sure what events ZK would send and > in what order, it depending on what was done before the freeze and where it > was frozen. > > Back to the broad goal, I want to increase confidence in this situation > that the process still owns the ZK lock without firing off network events > before committing every transaction. Obviously, fine granular locks would > solve this problem, but that comes with an unacceptable performance trade > off. > > Now, let's say I could do something like "long > org.apache.zookeeper.ZooKeeper.getLastSent()". Well, I don't know if the > ZK server actually received the packet, assuming it did receive the packet > I don't know when it received the packet, and I don't know when the OS > received the ack. However, it does assert that the SendThread was > scheduled and able to call System.getNanos() in ClientCnxnSocket. This > increasing the likelihood that the process was sending heartbeats. In > addition to this, if I haven't received a push notification from the ZK > event thread implying I've lost the lock, I have higher confidence that the > session hasn't been lost and that I still have the coarse lock, which > satisfies my broad goal somewhat better than the current state. > You could just release the lock as soon as you receive ConnectionLoss (i.e.: without waiting for SessionExpired, which you'll only get upon reconnecting to a ZK server.. which could take longer, given a partition or loaded network). But the case you are exposing is conflated with the pathological scenario of a JVM instance starving it's threads... if that's a risk, you might as well have an external health-check process that kills your JVM entirely once it's likely that the ZK thread might be starving (hence, losing your lock being more likely). -rgs
