Re: underutilized servers

2021-03-06 Thread Attila Wind

Thanks Bowen,

 * "How do you split?"
   challenging to answer short, but let me try: physical host has cores
   from idx 0 - 11 (6 physical and 6 virtual in pairs - they are in
   pairs as 0,6 belongs together, then 1,7 and then 2,8 and so on)
   What we do is that in the virt-install command we use --cpu
   host-passthrough --cpuset={{virtinst_cpu_set}} --vcpus=6
   where {{virtinst_cpu_set}} is
   - 0,6,1,7,2,8 - for CassandraVM
   - 3,9,4,10,5,11 - for the other VM
   (we split the physical host into 2 VMs)

 * "do you expose physical disks to the VM or use disk image files"
   no images, physical host has 2 spinning disks and 1 SSD drive
   CassandraVM gets assigned explicitly 1 of the spinning disks and she
   also gets assigned a partition of the SSD (which is used for commit
   logs only so that is separated from the data)

 * "A 50-70% utilization of a 1 Gbps network interface on average
   doesn't sound good at all."
   Yes, this is weird... Especially because e.g. if we bring down a
   node, the other 2 nodes (we go with RF=2) are producing ~600Mb hints
   files / minute
   And assuming hint files is basicall the saved "network traffic"
   until node is down this would still just give 10Mb/sec ...
   OK, these are just the replicated updates, there is also read and of
   course App layer is also reading but even with that in mind it does
   not add up... So we will try to do further analysis here

Thanks for the article also regarding the Counter tables!
Actually we already know for a while there are "interesting" things 
going around the Counter tables it is surprising how difficult to find 
info regarding this topic...
I personally tried to look around here several times and always just 
getting the same and the same information in posts...


Moving away from counters would not be bad especially because of the 
difficulties around DELETEing (we also feel it) them however I do not 
see any obvious migration strategy here...
But maybe let me ask this in a separate question. Might make more 
sense... :-)


Thanks again - and thanks to others as well

It looks mastering the "nodetool tpstats" and the Cassandra thread pools 
would worth some time... :-)



Attila Wind

http://www.linkedin.com/in/attilaw 
Mobile: +49 176 43556932


06.03.2021 13:03 keltezéssel, Bowen Song írta:


Hi Attila,


Addressing your data modelling issue is definitely important, and this 
alone may be enough to solve all the issues you have with Cassandra.


  * "Since these are VMs, is there any chance they are competing for
resources on the same physical host?"
We are splitting the physical hardware into 2 VMs - and resources
(cpu cores, disks, ram) all assigned in a dedicated fashion to the
VMs without intersection

How do you split? Number of cores in all VMs sums to the total 
physical CPU cores is not enough, because context switches and 
possible thread contentions will waste CPU cycles. Since you have also 
said 8-12% CPU time is spent in sys mode, I think it warrants an 
investigation.


Also, do you expose physical disks to the VM or use disk image files? 
Disk image files can be slow, especially for high IOPS random reads.


Personally, I won't recommend running a database on a VM other than 
for dev/testing/etc. purposes. If possible, you should try to add a 
node running on a bare metal server of the similar spec as the VM, and 
see if there's any noticeable performance differences between this 
bare metal node and the VM nodes.



  * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the
limit of the physical host - so our 2 VMs competing here. Possible
that Cassandra VM has ~50-70% of it...

A 50-70% utilization of a 1 Gbps network interface on average doesn't 
sound good at all. That over 60MB/s network traffic constantly. Can 
you investigate why is this happening? Do you really read/write that 
much? Or is it something else?



  * "nodetool tpstats"
whooa I never used it, we definitely need some learning here to
even understand the output... :-) But I copy that here to the
bottom ... maybe clearly shows something to someone who can read it...

I noticed that you are using counters in Cassandra. I have to say that 
I haven't had a good experience with Cassandra counters. An article 
 
which I read recently may convince you to get rid of it. I also don't 
think counter is something the Cassandra developers are focused on, 
because things like CASSANDRA-6506 
 have been 
sitting there for many years.


Use your database software for their strengths, not their weaknesses. 
You have Cassandra, but you don't have to use every feature in 
Cassandra. Sometimes another technology may be more suitable for 
something that Cassandra can do but isn't very good at.



Cheers,

Bowen

On 05/03/2021 

Re: underutilized servers

2021-03-06 Thread Bowen Song

Hi Attila,


Addressing your data modelling issue is definitely important, and this 
alone may be enough to solve all the issues you have with Cassandra.


 * "Since these are VMs, is there any chance they are competing for
   resources on the same physical host?"
   We are splitting the physical hardware into 2 VMs - and resources
   (cpu cores, disks, ram) all assigned in a dedicated fashion to the
   VMs without intersection

How do you split? Number of cores in all VMs sums to the total physical 
CPU cores is not enough, because context switches and possible thread 
contentions will waste CPU cycles. Since you have also said 8-12% CPU 
time is spent in sys mode, I think it warrants an investigation.


Also, do you expose physical disks to the VM or use disk image files? 
Disk image files can be slow, especially for high IOPS random reads.


Personally, I won't recommend running a database on a VM other than for 
dev/testing/etc. purposes. If possible, you should try to add a node 
running on a bare metal server of the similar spec as the VM, and see if 
there's any noticeable performance differences between this bare metal 
node and the VM nodes.



 * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the limit
   of the physical host - so our 2 VMs competing here. Possible that
   Cassandra VM has ~50-70% of it...

A 50-70% utilization of a 1 Gbps network interface on average doesn't 
sound good at all. That over 60MB/s network traffic constantly. Can you 
investigate why is this happening? Do you really read/write that much? 
Or is it something else?



 * "nodetool tpstats"
   whooa I never used it, we definitely need some learning here to even
   understand the output... :-) But I copy that here to the bottom ...
   maybe clearly shows something to someone who can read it...

I noticed that you are using counters in Cassandra. I have to say that I 
haven't had a good experience with Cassandra counters. An article 
 
which I read recently may convince you to get rid of it. I also don't 
think counter is something the Cassandra developers are focused on, 
because things like CASSANDRA-6506 
 have been sitting 
there for many years.


Use your database software for their strengths, not their weaknesses. 
You have Cassandra, but you don't have to use every feature in 
Cassandra. Sometimes another technology may be more suitable for 
something that Cassandra can do but isn't very good at.



Cheers,

Bowen

On 05/03/2021 18:37, Attila Wind wrote:


Thanks for the answers @Sean and @Bowen !!!

First of all, this article described very similar thing we experience 
- let me share

https://www.senticore.com/overcoming-cassandra-write-performance-problems/
we are studying that now

Furthermore

  * yes, we have some level of unbalanced data which needs to be
improved - this is on our backlog so should be done
  * and yes we do see clearly that this unbalanced data is slowing
down everything in Cassandra (there is proof of it in our
Prometheus+Grafana based monitoring)
  * we will do this optimization now definitely (luckily we have plan
already)

@Sean:

  * "Since these are VMs, is there any chance they are competing for
resources on the same physical host?"
We are splitting the physical hardware into 2 VMs - and resources
(cpu cores, disks, ram) all assigned in a dedicated fashion to the
VMs without intersection
BUT!!
You are right... There is one thing we are sharing: network
bandwidth... and actually that one does not come up in the
"iowait" part for sure. We will further analyze into this
direction definitely because from the monitoring as far as I see
yeppp, we might hit the wall here
  * consistency level: we are using LOCAL_ONE
  * "Does the app use prepared statements that are only prepared once
per app invocation?"
Yes and yes :-)
  * "Any LWT/”if exists” in your code?"
No. We go with RF=2 so we even can not use this (as LWT goes with
QUORUM and in our case this would mean we could not tolerate
losing a node... not good... so no)

@Bowen:

  * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the
limit of the physical host - so our 2 VMs competing here. Possible
that Cassandra VM has ~50-70% of it...
  * The CPU's "system" value shows 8-12%
  * "nodetool tpstats"
whooa I never used it, we definitely need some learning here to
even understand the output... :-) But I copy that here to the
bottom ... maybe clearly shows something to someone who can read it...

so, "nodetool tpstats" from one of the nodes

Pool Name Active   Pending  Completed   Blocked  All time blocked
ReadStage  0 0 6248406 
0 0
CompactionExecutor 0 0 168525 
0 0

Re: underutilized servers

2021-03-06 Thread Bowen Song

Hi Erick,


Please allow me to disagree on this. A node dropping reads and writes 
doesn't always mean the disk is the bottleneck. I have seen the same 
behaviour when a node had excessive STW GCs and a lots of timeouts, and 
I have also seen writes get dropped because the size of the mutation 
exceeds half of the commit log segment size. I'd like to keep an open 
mind until it's supported by evidence, so we don't ended up wasting time 
(and money) on trying to fix an issue that doesn't exist in the first place.



Cheers,

Bowen

On 05/03/2021 23:09, Erick Ramirez wrote:
The tpstats you posted show that the node is dropping reads and writes 
which means that your disk can't keep up with the load meaning your 
disk is the bottleneck. If you haven't already, place data and 
commitlog on separate disks so they're not competing for the same IO 
bandwidth. Note that It's OK to have them on the same disk/volume if 
you have NVMe SSDs since it's a lot more difficult to saturate them.


The challenge with monitoring is that typically it's only checking 
disk stats every 5 minutes (for example). But your app traffic is 
bursty in nature so stats averaged out over a period of time is 
irrelevant because the only thing that matters is what the disk IO is 
at the the time you hit peak loads.


The dropped reads and mutations tell you the node is overloaded. 
Provided your nodes are configured correctly, the only way out of this 
situation is to correctly size your cluster and add more nodes -- your 
cluster needs to be sized for peak loads, not average throughput. Cheers!


Re: underutilized servers

2021-03-05 Thread Erick Ramirez
The tpstats you posted show that the node is dropping reads and writes
which means that your disk can't keep up with the load meaning your disk is
the bottleneck. If you haven't already, place data and commitlog on
separate disks so they're not competing for the same IO bandwidth. Note
that It's OK to have them on the same disk/volume if you have NVMe SSDs
since it's a lot more difficult to saturate them.

The challenge with monitoring is that typically it's only checking disk
stats every 5 minutes (for example). But your app traffic is bursty in
nature so stats averaged out over a period of time is irrelevant because
the only thing that matters is what the disk IO is at the the time you hit
peak loads.

The dropped reads and mutations tell you the node is overloaded. Provided
your nodes are configured correctly, the only way out of this situation is
to correctly size your cluster and add more nodes -- your cluster needs to
be sized for peak loads, not average throughput. Cheers!


Re: underutilized servers

2021-03-05 Thread daemeon reiydelle
you did not specify read and write consistency levels, default would be to
hit two nodes (one for data, one for digest) with every query. Network load
of 50% is not too helpful. 1gbit? 10gbit? 50% of each direction or average
of both?

Iowait is not great for a system of this size: assuming that you have 3
vm's on THREE SEPARATE physical systems and WITHOUT network attached storage
...


*Daemeon Reiydelle*
*email: daeme...@gmail.com *
*LI: https://www.linkedin.com/in/daemeonreiydelle/
*
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*

"Life should not be a journey to the grave with the intention of arriving
safely in a pretty and well preserved body, but rather to skid in broadside
in a cloud of smoke, thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!" - Hunter S. Thompson


On Fri, Mar 5, 2021 at 6:48 AM Attila Wind  wrote:

> Hi guys,
>
> I have a DevOps related question - hope someone here could give some
> ideas/pointers...
>
> We are running a 3 nodes Cassandra cluster
> Recently we realized we do have performance issues. And based on
> investigation we took it seems our bottleneck is the Cassandra cluster. The
> application layer is waiting a lot for Cassandra ops. So queries are
> running slow on Cassandra side however due to our monitoring it looks the
> Cassandra servers still have lots of free resources...
>
> The Cassandra machines are virtual machines (we do own the physical hosts
> too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated
> to it.
> We are using Ubuntu Linux 18.04 distro - everywhere the same version (the
> physical and virtual host)
> We are running Cassandra 4.0-alpha4
>
> What we see is
>
>- CPU load is around 20-25% - so we have lots of spare capacity
>- iowait is around 2-5% - so disk bandwidth should be fine
>- network load is around 50% of the full available bandwidth
>- loadavg is max around 4 - 4.5 but typically around 3 (because of the
>cpu count 6 should represent 100% load)
>
> and still, query performance is slow ... and we do not understand what
> could hold Cassandra back to fully utilize the server resources...
>
> We are clearly missing something!
> Anyone any idea / tip?
>
> thanks!
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
>


Re: underutilized servers

2021-03-05 Thread Attila Wind

Thanks for the answers @Sean and @Bowen !!!

First of all, this article described very similar thing we experience - 
let me share

https://www.senticore.com/overcoming-cassandra-write-performance-problems/
we are studying that now

Furthermore

 * yes, we have some level of unbalanced data which needs to be
   improved - this is on our backlog so should be done
 * and yes we do see clearly that this unbalanced data is slowing down
   everything in Cassandra (there is proof of it in our
   Prometheus+Grafana based monitoring)
 * we will do this optimization now definitely (luckily we have plan
   already)

@Sean:

 * "Since these are VMs, is there any chance they are competing for
   resources on the same physical host?"
   We are splitting the physical hardware into 2 VMs - and resources
   (cpu cores, disks, ram) all assigned in a dedicated fashion to the
   VMs without intersection
   BUT!!
   You are right... There is one thing we are sharing: network
   bandwidth... and actually that one does not come up in the "iowait"
   part for sure. We will further analyze into this direction
   definitely because from the monitoring as far as I see yeppp, we
   might hit the wall here
 * consistency level: we are using LOCAL_ONE
 * "Does the app use prepared statements that are only prepared once
   per app invocation?"
   Yes and yes :-)
 * "Any LWT/”if exists” in your code?"
   No. We go with RF=2 so we even can not use this (as LWT goes with
   QUORUM and in our case this would mean we could not tolerate losing
   a node... not good... so no)

@Bowen:

 * The bandwidth limit is 1Gbit/sec (so 120Mb/sec) BUT it is the limit
   of the physical host - so our 2 VMs competing here. Possible that
   Cassandra VM has ~50-70% of it...
 * The CPU's "system" value shows 8-12%
 * "nodetool tpstats"
   whooa I never used it, we definitely need some learning here to even
   understand the output... :-) But I copy that here to the bottom ...
   maybe clearly shows something to someone who can read it...

so, "nodetool tpstats" from one of the nodes

Pool Name Active Pending  Completed   
Blocked  All time blocked
ReadStage  0 0 6248406 
0 0
CompactionExecutor 0 0 168525 
0 0
MutationStage  0 0 25116817 
0 0
MemtableReclaimMemory  0 0 17636 
0 0
PendingRangeCalculator 0 0 7 
0 0
GossipStage    0 0 324388 
0 0
SecondaryIndexManagement   0 0 0 
0 0
HintsDispatcher    1 0 75 
0 0
Repair-Task    0 0 1 
0 0
RequestResponseStage   0 0 31186150 
0 0
Native-Transport-Requests  0 0 22827219 
0 0
CounterMutationStage   0 0 12560992 
0 0
MemtablePostFlush  0 0 19259 
0 0
PerDiskMemtableFlushWriter_0   0 0 17636 
0 0
ValidationExecutor 0 0 48 
0 0
Sampler    0 0 0 
0 0
ViewBuildExecutor  0 0 0 
0 0
MemtableFlushWriter    0 0 17636 
0 0
InternalResponseStage  0 0 44658 
0 0
AntiEntropyStage   0 0 161 
0 0
CacheCleanupExecutor   0 0 0 
0 0


Message type   Dropped  Latency waiting in queue 
(micros)
 50% 95%   
99%   Max
READ_RSP    18   1629.72 8409.01 
155469.30 386857.37
RANGE_REQ    0  0.00 0.00  
0.00  0.00
PING_REQ 0  0.00 0.00  
0.00  0.00
_SAMPLE  0  0.00 0.00  
0.00  0.00
VALIDATION_RSP   0  0.00 0.00  
0.00  0.00
SCHEMA_PULL_RSP  0  0.00 0.00  
0.00  0.00
SYNC_RSP 0  0.00 0.00  
0.00  0.00
SCHEMA_VERSION_REQ   0  0.00 0.00  
0.00  0.00
HINT_RSP 0    943.13 3379.39   
5839.59  52066.35
BATCH_REMOVE_RSP 0  

Re: underutilized servers

2021-03-05 Thread Bowen Song
Based on my personal experience, the combination of slow read queries 
and low CPU usage is often an indicator of bad table schema design 
(e.g.: large partitions) or bad query (e.g. without partition key). 
Check the Cassandra logs first, is there any long stop-the-world GC? 
tombstone warning? anything else that's out of ordinary? Check the 
output from "nodetool tpstats", is there any pending or blocked tasks? 
Which thread pool(s) are they in? Is there a high number of dropped 
messages? If you can't find anything useful from the Cassandra server 
logs and "nodetool tpstats", try to get a few slow queries from your 
application's log, and run them manually in the cqlsh. Are the results 
very large? How long do they take?



Regarding some of your observations:

/> CPU load is around 20-25% - so we have lots of spare capacity/

Is it very few threads each uses nearly 100% of a CPU core? If so, what 
are those threads? (I find the ttop command from the sjk tool 
 very helpful)


/> network load is around 50% of the full available bandwidth/

This sounds alarming to me. May I ask what's the full available 
bandwidth? Do you have a lots of CPU time spent in sys (vs user) mode?



On 05/03/2021 14:48, Attila Wind wrote:


Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...


We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on 
investigation we took it seems our bottleneck is the Cassandra 
cluster. The application layer is waiting a lot for Cassandra ops. So 
queries are running slow on Cassandra side however due to our 
monitoring it looks the Cassandra servers still have lots of free 
resources...


The Cassandra machines are virtual machines (we do own the physical 
hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM 
dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version 
(the physical and virtual host)

We are running Cassandra 4.0-alpha4

What we see is

  * CPU load is around 20-25% - so we have lots of spare capacity
  * iowait is around 2-5% - so disk bandwidth should be fine
  * network load is around 50% of the full available bandwidth
  * loadavg is max around 4 - 4.5 but typically around 3 (because of
the cpu count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand what 
could hold Cassandra back to fully utilize the server resources...


We are clearly missing something!
Anyone any idea / tip?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw 
Mobile: +49 176 43556932




RE: underutilized servers

2021-03-05 Thread Durity, Sean R
Are there specific queries that are slow? Partition-key queries should have 
read latencies in the single digits of ms (or faster). If that is not what you 
are seeing, I would first review the data model and queries to make sure that 
the data is modeled properly for Cassandra. Without metrics, I would start at 
16-20 GB of RAM for Cassandra on each node (or 31 GB if you can get 64 GB per 
host).

Since these are VMs, is there any chance they are competing for resources on 
the same physical host? In my (limited) VM experience, VMs can be 10x slower 
than physical hosts with local SSDs. (They don't have to be slower, but it can 
be harder to get visibility to the actual bottlenecks.)

I would also look to see what consistency level is being used with the queries. 
In most cases LOCAL_QUORUM or LOCAL_ONE is preferred.

Does the app use prepared statements that are only prepared once per app 
invocation? Any LWT/"if exists" in your code?


Sean Durity

From: Attila Wind 
Sent: Friday, March 5, 2021 9:48 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] underutilized servers


Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...

We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on investigation 
we took it seems our bottleneck is the Cassandra cluster. The application layer 
is waiting a lot for Cassandra ops. So queries are running slow on Cassandra 
side however due to our monitoring it looks the Cassandra servers still have 
lots of free resources...

The Cassandra machines are virtual machines (we do own the physical hosts too) 
built with kvm - with 6 CPU cores (3 physical) and 32GB RAM dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version (the 
physical and virtual host)
We are running Cassandra 4.0-alpha4

What we see is

  *   CPU load is around 20-25% - so we have lots of spare capacity
  *   iowait is around 2-5% - so disk bandwidth should be fine
  *   network load is around 50% of the full available bandwidth
  *   loadavg is max around 4 - 4.5 but typically around 3 (because of the cpu 
count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand what could 
hold Cassandra back to fully utilize the server resources...

We are clearly missing something!
Anyone any idea / tip?

thanks!
--
Attila Wind

http://www.linkedin.com/in/attilaw 
[linkedin.com]<https://urldefense.com/v3/__http:/www.linkedin.com/in/attilaw__;!!M-nmYVHPHQ!bV6Y2yInjIblpSxfYKYMiA824aLtBpQOoMG9YxMiFFqAvGsnmu9WObBWHS6rFDGp-DVnAQ8$>
Mobile: +49 176 43556932




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


underutilized servers

2021-03-05 Thread Attila Wind

Hi guys,

I have a DevOps related question - hope someone here could give some 
ideas/pointers...


We are running a 3 nodes Cassandra cluster
Recently we realized we do have performance issues. And based on 
investigation we took it seems our bottleneck is the Cassandra cluster. 
The application layer is waiting a lot for Cassandra ops. So queries are 
running slow on Cassandra side however due to our monitoring it looks 
the Cassandra servers still have lots of free resources...


The Cassandra machines are virtual machines (we do own the physical 
hosts too) built with kvm - with 6 CPU cores (3 physical) and 32GB RAM 
dedicated to it.
We are using Ubuntu Linux 18.04 distro - everywhere the same version 
(the physical and virtual host)

We are running Cassandra 4.0-alpha4

What we see is

 * CPU load is around 20-25% - so we have lots of spare capacity
 * iowait is around 2-5% - so disk bandwidth should be fine
 * network load is around 50% of the full available bandwidth
 * loadavg is max around 4 - 4.5 but typically around 3 (because of the
   cpu count 6 should represent 100% load)

and still, query performance is slow ... and we do not understand what 
could hold Cassandra back to fully utilize the server resources...


We are clearly missing something!
Anyone any idea / tip?

thanks!

--
Attila Wind

http://www.linkedin.com/in/attilaw 
Mobile: +49 176 43556932