Here are a couple of iostat snapshots showing the spikes in disk queue size
(in these cases correlating with spikes in w/s and %util)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 5.630.002.33 0.00
Using cassandra-stress with the out of the box schema I am seeing around
140k rows/second throughput using 1 client on each of 3 client machines.
On the servers:
- CPU utilization: 43% usr/20% sys, 55%/28%, 70%/10% (the last number is
the older box)
- Inbound network traffic: 174 Mbps,
Shoot - I didn't see that one. I subscribe to the digest but was focusing
on the direct replies and accidentally missed Patrick and Jeff Jirsa's
messages. Sorry about that...
I've been using a combination of cassandra-stress, cqlsh COPY FROM and a
custom C++ application for my ingestion
Did you try adding more client stress nodes as Patrick recommended?
On Tue, Jun 13, 2017 at 9:31 PM Eric Pederson wrote:
> Scratch that theory - the flamegraphs show that GC is only 3-4% of two
> newer machine's overall processing, compared to 18% on the slow machine.
>
> I
Scratch that theory - the flamegraphs show that GC is only 3-4% of two
newer machine's overall processing, compared to 18% on the slow machine.
I took that machine out of the cluster completely and recreated the
keyspaces. The ingest tests now run slightly faster (!). I would have
expected a
Hi all - I wanted to follow up on this. I'm happy with the throughput
we're getting but I'm still curious about the bottleneck.
The big thing that sticks out is one of the nodes is logging frequent
GCInspector messages: 350-500ms every 3-6 seconds. All three nodes in the
cluster have identical
Due to a cut and paste error those flamegraphs were a recording of the
whole system, not just Cassandra.Throughput is approximately 30k
rows/sec.
Here's the graphs with just the Cassandra PID:
-
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars2.svg
Totally understood :)
I forgot to mention - I set the /proc/irq/*/smp_affinity mask to include
all of the CPUs. Actually most of them were set that way already (for
example, ,) - it might be because irqbalanced is running.
But for some reason the interrupts are all being handled
You shouldn't need a kernel recompile. Check out the section "Simple
solution for the problem" in
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux.
You can balance your requests across up to 8 CPUs.
I'll check out the flame graphs in a little bit - in the middle of
Hi Jonathan -
It looks like these machines are configured to use CPU 0 for all I/O
interrupts. I don't think I'm going to get the OK to compile a new kernel
for them to balance the interrupts across CPUs, but to mitigate the problem
I taskset the Cassandra process to run on all CPU except 0. It
When you are running a stress test, 1-1 match client to server won't
saturate a cluster. I would go closer to 3-5 clients per server, so 10-15
clients against your 3 node cluster.
Patrick
On Tue, May 23, 2017 at 4:18 PM, Jeff Jirsa wrote:
>
> Are the 3 sending clients maxed
Are the 3 sending clients maxed out?
Are you seeing JVM GC pauses?
On 2017-05-22 14:02 (-0700), Eric Pederson wrote:
> Hi all:
>
> I'm new to Cassandra and I'm doing some performance testing. One of things
> that I'm testing is ingestion throughput. My server setup is:
Hi,
Change to *|durable_writes = false|*
And please post the results.
Thanks.
On 05/22/2017 10:08 PM, Jonathan Haddad wrote:
> How many CPUs are you using for interrupts?
>
> http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
>
> Have you tried making a flame
How many CPUs are you using for interrupts?
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
Have you tried making a flame graph to see where Cassandra is spending its
time? http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
Are you tracking GC
14 matches
Mail list logo