I increased all of the servers up to 32GB of memory and confirmed that I have the flags that you mentioned in the env file. Unfortunately within a day I lost one of the tservers. In the tserver logs, looking at the timestamps leading up to the event, I see: 02:00:03,835 [cache.LruBlockCache] 02:00:51,580 [tabletserver.TabletServer] DEBUG: MultiScanSess 02:01:02,267 [tabletserver.TabletServer] FATAL: Lost tablet server lock (reason = LOCK_DELETED), exiting.
What's interesting on this one is that in the master log file, there is no error message at that time. What I do see is this: 02:01:02,168 [master.Master] DEBUG: Finished gathering information from 2 servers in 0.01 seconds That would mean the tserver killed itself within milliseconds of the master getting the information successfully. Any thoughts on this one? -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Tserver-kills-themselves-from-lost-Zookeeper-locks-tp6125p6360.html Sent from the Users mailing list archive at Nabble.com.
