To follow up, we continued to experience Zookeeper ConnectionLossExceptions even after following Josh's advice on our cluster. After running some diagnostics, we found that our VMs were under intermittently heavy loads, which we could not control.
Instead of continuing to optimize our resource usage, we simply increased the following settings: zoo.cfg: # 2 minutes maxSessionTimeout=120000 initLimit=20 syncLimit=10 accumulo-site.xml instance.zookeeper.timeout=120s Since then, we haven't seen a single ConnectionLossException on our cluster, despite a known network hiccup in our VM environment of ~5 minutes. We don't know what the long term impact on our cluster will be, but we're optimistic that our "pessimistic" cluster will stay up! On Tue, Dec 10, 2013 at 1:08 PM, Joe Gresock <[email protected]> wrote: > I'm forwarding this email chain to the user group, since it was so helpful > to our Accumulo cluster setup. The original post is at the bottom. > > Thanks to Josh Elser! > >
