Hanno, thanks for your feedback. I have a better understanding of the problem now.
I am not using a dedicated disk for transaction log or dedicated machine for Zookeeper. Will seriously consider latter first (which will automatically solve the former issue). Meanwhile I have increased the session timeout as a work around. I am able to do that because my sundry clients do not communicate with Zookeeper directly, instead go through a proxy process. Thus, it is possible to increase the session timeout for essentially a single ZooKeeper client. I am also going to look at client side retries along with tuning of GC parameters to further alleviate the problem. Thanks Deepinder On 8/10/13 12:02 PM, "Hanno Schlichting" <[email protected]> wrote: >Hi. > >On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote: >> Aug 9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation >> timeout > >That's one of the "retryable exceptions" in Kazoo. So if you'd use >client.retry, you could tolerate one or more instances of this error. > >> zookeeper logs around the error time: >> >> 2013-08-09 07:07:06,580 [myid:] - WARN [SyncThread:0:FileTxnLog@321] - >> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will >> adversely effect operation latency. See the ZooKeeper troubleshooting >> guide > >More than 2 seconds of fsync stall is quite long. And with that or GC >pauses, it's more than likely that you exceed the session timeout >limits. > >Did you follow the recommendations in >http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html? Especially >around using dedicated disks for the transaction log and using a >dedicated machine for Zookeeper to avoid other processes stalling it? > >> Could the client (Kazoo) be timing out because of fsync delay? What >> parameter would control duration for OperationTimeoutException that I >>can >> perhaps increase to verify? There is only ZooKeeper client and the load >> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper >>configuration >> is default. Kazoo client params are also default. > >In the admin guide, look at tickTime and syncLimit. In a default config >the session timeout is ~4 seconds. While you can increase this value, >you thereby also increase the minimum time it takes Zookeeper to >consider an actual client to be dead. Depending on what you use ZK for, >you might prefer failing fast and thus low session timeout values. > >Hanno > >
