BTW: We've had to introduce external cron script that were killing processes spending too much in GC. That helped in similar situation.
2013/1/16 Benjamin Reed <[email protected]> > > Does this mean that session_expired event may be triggered all by > > zk-client-library itself ?(by something like a built-in client-local > > timer, without notification from zk server? ) > > > > (I am digging into the source code, but in case of misunderstanding of > > the code, I need your confirmation please) > > our general rule with zookeeper is to either give the correct answer > or say "i don't know". connection loss is the zookeeper version of i > don't know. you will only get the session expired event when the > client gets confirmation from a server that the session is really > gone. this means that even if a client is disconnected for days, you > will not get the session expired until client connects to the server > and the server tells it that the connection has expired. > > > > >> HBase region servers went into gc for many minutes and then woke up > still thinking they are the leader > > > > Could this happen if I just follow(correctly and without a > > client-local timer or external fencing resources) the recipe for > > distributed clock? > > this did happen with hbase. client-local timers don't help. there are > multiple problems going on: if a process is the leader and the gc > freezes time (or the process gets swapped out or the hypervisor > suspends the vm) right before the instruction that can only be > executed by a leader (send database update for example), when time > unfreezes, the rest of the system knows the leader has changed, > another thread in the process might be figuring it out, local timers > might be ready to fire, but the "leader instruction" will execute > since that is the next instruction for the CPU to execute. > > ben > -- Best regards, Vitalii Tymchyshyn
