Re: high latency on one node after replacement

Mike Torra Tue, 27 Mar 2018 12:59:27 -0700

thanks for pointing that out, i just found it too :) i overlooked this

On Tue, Mar 27, 2018 at 3:44 PM, Voytek Jarnot <voytek.jar...@gmail.com>
wrote:


> Have you ruled out EBS snapshot initialization issues (
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html)?
>
> On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra <mto...@salesforce.com> wrote:
>
>> Hi There -
>>
>> I have noticed an issue where I consistently see high p999 read latency
>> on a node for a few hours after replacing the node. Before replacing the
>> node, the p999 read latency is ~30ms, but after it increases to 1-5s. I am
>> running C* 3.11.2 in EC2.
>>
>> I am testing out using EBS snapshots of the /data disk as a backup, so
>> that I can replace nodes without having to fully bootstrap the replacement.
>> This seems to work ok, except for the latency issue. Some things I have
>> noticed:
>>
>> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
>> 'Dropped', while this is going on. There are only a few of these.
>> - the logs show warnings like this:
>>
>> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
>> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
>> with average duration of 235.88ms, 86 have exceeded the configured commit
>> interval by an average of 113.66ms
>>   and I can see some slow queries in debug.log, but I can't figure out
>> what is causing it
>> - gc seems normal
>>
>> Could this have something to do with starting the node with the EBS
>> snapshot of the /data directory? My first thought was that this is related
>> to the EBS volumes, but it seems too consistent to be actually caused by
>> that. The problem is consistent across multiple replacements, and multiple
>> EC2 regions.
>>
>> I appreciate any suggestions!
>>
>> - Mike
>>
>
>

Re: high latency on one node after replacement

Reply via email to