Re: Lost tablet server lock..SESSION_EXPIRED

Jeff Kubina Fri, 14 Oct 2016 08:15:24 -0700

I have not tried the G1 gc yet but it does look like it is production ready
according to Oracle.


You can use jstat to monitor gc of a tserver to see if gc really is the
issue for the pauses.

My usual gc related options for tservers are

-XX:NewSize=2G
-XX:MaxNewSize=2G
-XX:MaxPermSize=512m
-XX:CMSInitiatingOccupancyFraction=50
-XX+UseParNewGC
-XX:SurvivorRatio=6
-XX:ParallelGCThreads=16
-XX:ConGCThreads=8
-XX:+UseCondCardMark
-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=4096
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled

If you are doing a lot of ingesting via batch writes (which the Upsess
implies), you might consider increasing tserver.walog.max.size to 2G
instead of 1G (but doing so will cause the loss of more data if a tserver
dies).

The troubleshooting
<https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/troubleshooting.txt>
documentation with accumulo is helpful in finding latency issues too.



-- 
Jeff Kubina
410-988-4436


On Thu, Oct 13, 2016 at 10:49 AM, Noe Detore <[email protected]> wrote:

> Yes, seeing a lot of DEBUG:Upsess. Also seeing  
> [server.GarbageCollectionLogger]
> DEBUG: gc ParNew=64.69(+1.24) secs ConcurrentMarkSweep=102.51(+0.06) secs
> freemem=4,844,821,808(-20,292,780,896) totalmem=25,525,551,104
> 2016-10-13 11:22:17,963 [zookeeper.ZooLock] DEBUG: event null None
> Disconnected
>
> During hotspot seems like a java gc pause is causing zk heart beat to miss
> and then expire. Are there recommend java gc configurations?  We are using
> native memory. Would trying G1 gc be advised?
>
> Thank you
>
> On Fri, Oct 7, 2016 at 8:23 PM, Jeff Kubina <[email protected]> wrote:
>
>> Noe,
>>
>> Do you have a lot (1000s) of "[tserver.TableServer] DEBUG: UpSess ..."
>> messages in your tserver logs prior to the FATAL or "ERROR: Lost tablet
>> server lock" error message?
>>
>> Jeff
>>
>>
>> --
>> Jeff Kubina
>> 410-988-4436
>>
>>
>> On Fri, Oct 7, 2016 at 10:34 AM, Noe Detore <[email protected]>
>> wrote:
>>
>>> Any updates on this issue https://issues.apache.org/jira
>>> /browse/ACCUMULO-3336 ? I am seeing this behavior using 1.7.2 on one of
>>> our clusters. Not seeing on other clusters, but what could be some causes?
>>> Swap on server looks good as there is none. Are there particular
>>> configurations to adjust?
>>>
>>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>>> KeeperErrorCode = Session expired ...
>>> 2016-10-06 23:22:30,633 [zookeeper.DistributedWorkQueue] INFO : Got
>>> unexpected zookeeper event: None for ...
>>> 2016-10-06 23:22:30,679 [tserver.TabletServer] ERROR: Lost tablet server
>>> lock (reason = SESSION_EXPIRED), exiting
>>>
>>> Thanks
>>> Noe
>>>
>>
>>
>

Re: Lost tablet server lock..SESSION_EXPIRED

Reply via email to