On Fri, Mar 25, 2011 at 1:56 AM, Mohit <[email protected]> wrote: > Why not reconnect back to the zookeeper(at least try once and then abort, if > unsuccessful) and resetting trackers/watchers instead of aborting/killing > HMaster/HRegionServers just like it is done in one of the implementation of > abort able named HConnectionImplementation present in HConnectionManager? >
Hello Mohit: The ZooKeeper client is doing what you describes, sort of. On session timeout, it does a reconnect to the ensemble to ask if its session has indeed expired. If it has, then it'll log session expired. The regionserver will kill itself on loss of session because its likely that the data it was hosting has been assumed by another. The retry you refer to, IIRC, is something different -- its before session setup? Please cite it if you'd like me to explain. Do you think the session timed out because of a long GC session? If 0.90.1, there may be some things you can do. See http://hbase.apache.org/book/performance.html#jvm Yours, St.Ack
