Hello,

I have 8 node cluster, under heavy load a tserver goes down, we have systemd 
unit file to auto restart, but that causes unassigned tablet for an hour.

In the log of restarted tserver i see
WARN: Saw (possibly) transient exception communicating with zookeeper
and then error
KeeperErrorCode = ConnectionLoss for /accumulo/<instance >/xxx
KeeperErrroCode = ConnectionLoss
    at KeeperExcetion.create(KeeperException.java:102)
    at KeeperExcetion.create(KeeperException.java:54)
    at org.apache.zookeeper.Zookeeper.getChildren(zookeeper.java:2736)
    at org.apache.zookeeper.Zookeeper.getChildren(zookeper.java:2762)
    at 
org.apache.accumulo.fate.zookeeper.ZooReader.getChildren(ZooReader.java:159)
xxxxx

Any suggestions?

-S

Reply via email to