Hi Ted.

How long are the latency spikes when they occur?  Have you investigated
compactions (nodetool compactionstats) during the spike?

Are you also seeing large latency spikes in the p95 (95th percentile)
metrics? p99 catches outliers, which aren't necessarily always cause for
alarm.

Are the nodes showing any other signs of stress? CPU, GC, etc? Is there
anything pending in nodetool tpstats?

Regarding the read repairs, have you tested writing at a higher consistency
to see if that changes the number of RR occurring?


*Brooke Jensen*
VP Technical Operations & Customer Services
www.instaclustr.com | support.instaclustr.com
<https://support.instaclustr.com/hc/en-us>

This email has been sent on behalf of Instaclustr Limited (Australia) and
Instaclustr Inc (USA). This email and any attachments may contain
confidential and legally privileged information.  If you are not the
intended recipient, do not copy or disclose its content, but please reply
to this email immediately and highlight the error to the sender and then
immediately delete the message.

On 18 January 2017 at 02:11, <sean_r_dur...@homedepot.com> wrote:

> Is this Java 8 with the G1 garbage collector or CMS? With Java 7 and CMS,
> garbage collection can cause delays like you are seeing. I haven’t seen
> that problem with G1, but garbage collection is where I would start looking.
>
>
>
>
>
> Sean Durity
>
> *From:* Ted Pearson [mailto:t...@tedpearson.com]
> *Sent:* Thursday, January 05, 2017 2:34 PM
> *To:* user@cassandra.apache.org
> *Subject:* Troubleshooting random node latency spikes
>
>
>
> Greetings!
>
> I'm working on setting up a new cassandra cluster with a write-heavy
> workload (50% writes), and I've run into a strange spiky latency problem.
> My application metrics showed random latency spikes. I tracked the latency
> back to spikes on individual cassandra nodes. 
> ClientRequest.Latency.Read/Write.p99
> is occasionally jumping on one node at a time to several seconds, instead
> of its normal value of around 1000 microseconds. I also noticed
> that ReadRepair.RepairedBackground.m1_rate goes from zero to a non-zero
> (around 1-2/sec) during the spike on that node. I'm lost as to why these
> spikes are happening, hope someone can give me ideas.
>
> I attempted to test if the ReadRepair metric is causally linked to the
> latency spikes, but even when I changed dclocal_read_repair_chance to 0 on
> my tables, even though the metrics showed no ReadRepair.Attempted, the
> ReadRepair.RepairedBackground metric still went up during latency spikes.
> Am I misunderstanding what this metric tracks? I don't understand why it
> went up if I turned off read repair.
>
> I'm currently running 2.2.6 in a dual-datacenter setup. It's patched to
> allow metrics to be recency-biased instead of tracking latency over the
> entire running of the java process. I'm using STCS. There is a large amount
> of data per node, about 500GB currently. I expect each row to be less than
> 10KB. It's currently running on way overpowered hardware - 512GB/raid 0 on
> nvme/44 cores on 2 sockets. All of my queries (reads and writes) are
> LOCAL_ONE and I'm using r=3.
>
>
>
> Thanks,
>
> Ted
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Reply via email to