Re: Tserver kills themselves from lost Zookeeper locks

Josh Elser Tue, 12 Nov 2013 11:27:26 -0800

Was there actually an 11 second delay in the tserver's debug log(2:00:51 to 2:01:02) or did you omit some log statements?

The log messages in your original email also showed MultiScanSession(s)immediately before the ZK lock lost.

Can you give us any information about the type of query workload you'reservicing here? A MultiScanSession is the equivalent to a "piece" of aBatchScanner running against a tserver. Are you doing any sort of heavyworkload in an SortedKeyValueIterator running on these tservers?


On 11/12/13, 9:36 AM, buttercream wrote:

I increased all of the servers up to 32GB of memory and confirmed that I have
the flags that you mentioned in the env file. Unfortunately within a day I
lost one of the tservers. In the tserver logs, looking at the timestamps
leading up to the event, I see:
02:00:03,835 [cache.LruBlockCache]
02:00:51,580 [tabletserver.TabletServer] DEBUG: MultiScanSess
02:01:02,267 [tabletserver.TabletServer] FATAL: Lost tablet server lock
(reason = LOCK_DELETED), exiting.

What's interesting on this one is that in the master log file, there is no
error message at that time. What I do see is this:
02:01:02,168 [master.Master] DEBUG: Finished gathering information from 2
servers in 0.01 seconds

That would mean the tserver killed itself within milliseconds of the master
getting the information successfully. Any thoughts on this one?



--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/Tserver-kills-themselves-from-lost-Zookeeper-locks-tp6125p6360.html
Sent from the Users mailing list archive at Nabble.com.

Re: Tserver kills themselves from lost Zookeeper locks

Reply via email to