Hi- Acknowledging in advance that what I'm asking goes against best practices as described here and by the ZooKeeper guides as well.. I was wondering what the possible consequences are to setting forceSync=no in zoo.cfg in stand-alone installations where a single machine hosts accumulo, zookeeper, Hadoop, etc.
This sort of configuration is obviously not for production and is used only when a client is interested in seeing a demo of an accumulo-based application but only has a single machine available at the time and often with just a single drive serving all mounted file systems. As one might expect in this sort of setup the zookeeper log starts to populate with: zookeeper.log.9:2014-01-21 19:19:38,885 [myid:] - WARN [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 5898ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide Eventually Accumulo will time out with a ConnectionLoss and the master process will go down. Is Accumulo's use of zookeeper primarily for cluster-wide synchronization during run-time or is there persistent stateful data that must be kept in sync with the contents of walogs and/or table files in HDFS? If the former then I imagine (in a stand-alone setup) that zookeeper corruption due to incomplete syncs during a power failure or the like could be remedied by a restart of the stack which would recover a prior zookeeper snapshot. If it's the latter then I can see things getting a bit messy. Thanks in advance. Frans --
