Re: Cassandra p95 latencies

2023-08-25 Thread Andrew Weaver
Do you have the SSTables per read metric for before and after you increased
the key cache size? If it was high before, that may have been the culprit
meaning compaction tuning is in order.

On Fri, Aug 25, 2023, 12:35 PM Shaurya Gupta  wrote:

> Thanks everyone.
> Updating this thread -
> We increased the key cache size from 100 MB to 200 MB and we believe that
> has brought down the latency from 40 ms p95 to 6 ms p95. I think there is
> still scope for improvement as both writes and reads are presently at p95 6
> ms. I would expect writes to be lower. But we are good with 6 ms for now at
> least.
>
> On Mon, Aug 14, 2023 at 11:56 AM Elliott Sims via user <
> user@cassandra.apache.org> wrote:
>
>> 1.  Check for Nagle/delayed-ack, but probably nodelay is getting set by
>> the driver so it shouldn't be a problem.
>> 2.  Check for network latency (just regular old ping among hosts, during
>> traffic)
>> 3.  Check your GC metrics and see if garbage collections line up with
>> outliers.  Some tuning can help there, depending on the pattern, but 40ms
>> p99 at least would be fairly normal for G1GC.
>> 4.  Check actual local write times, and I/O times with iostat.  If you
>> have spinning drives 40ms is fairly expected.  It's high but not totally
>> unexpected for consumer-grade SSDs.  For enterprise-grade SSDs commit times
>> that long would be very unusual.  What are your commitlog_sync settings?
>>
>> On Mon, Aug 14, 2023 at 8:43 AM Josh McKenzie 
>> wrote:
>>
>>> The queries are rightly designed
>>>
>>> Data modeling in Cassandra is 100% gray space; there unfortunately is no
>>> right or wrong design. You'll need to share basic shapes / contours of your
>>> data model for other folks to help you; seemingly innocuous things in a
>>> data model can cause unexpected issues w/C*'s storage engine paradigm
>>> thanks to the partitioning and data storage happening under the hood.
>>>
>>> If you were seeing single digit ms on 3.0.X or 3.11.X and 40ms p95 on
>>> 4.0 I'd immediately look to the DB as being the culprit. For all other
>>> cases, you should be seeing single digit ms as queries in C* generally boil
>>> down to key/value lookups (partition key) to a list of rows you either
>>> point query (key/value #2) or range scan via clustering keys and pull back
>>> out.
>>>
>>> There's also paging to take into consideration (whether you're using it
>>> or not, what your page size is) and the data itself (do you have thousands
>>> of columns? Multi-MB blobs you're pulling back out? etc). All can play into
>>> this.
>>>
>>> On Fri, Aug 11, 2023, at 3:40 PM, Jeff Jirsa wrote:
>>>
>>> You’re going to have to help us help you
>>>
>>> 4.0 is pretty widely deployed. I’m not aware of a perf regression
>>>
>>> Can you give us a schema (anonymized) and queries and show us a trace ?
>>>
>>>
>>> On Aug 10, 2023, at 10:18 PM, Shaurya Gupta 
>>> wrote:
>>>
>>> 
>>> The queries are rightly designed as I already explained. 40 ms is way
>>> too high as compared to what I seen with other DBs and many a times with
>>> Cassandra 3.x versions.
>>> CPU consumed as I mentioned is not high, it is around 20%.
>>>
>>> On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:
>>>
>>> Hi,
>>> P95 should not be a problem if rightly designed. Levelled compaction
>>> strategy further reduces this, however it consume some resources. For read,
>>> caching is also helpful.
>>> Can you check your cpu iowait as it could be the reason for delay
>>>
>>> Regards,
>>> Ashish
>>>
>>> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta, 
>>> wrote:
>>>
>>> Hi community
>>>
>>> What is the expected P95 latency for Cassandra Read and Write queries
>>> executed with Local_Quorum over a table with 3 replicas ? The queries are
>>> done using the partition + clustering key and row size in bytes is not too
>>> much, maybe 1-2 KB maximum.
>>> Assuming CPU is not a crunch ?
>>>
>>> We observe those to be 40 ms P95 Reads and same for Writes. This looks
>>> very high as compared to what we expected. We are using Cassandra 4.0.
>>>
>>> Any documentation / numbers will be helpful.
>>>
>>> Thanks
>>> --
>>> Shaurya Gupta
>>>
>>>
>>>
>>> --
>>> Shaurya Gupta
>>>
>>>
>>>
>> This email, including its contents and any attachment(s), may contain
>> confidential and/or proprietary information and is solely for the review
>> and use of the intended recipient(s). If you have received this email in
>> error, please notify the sender and permanently delete this email, its
>> content, and any attachment(s). Any disclosure, copying, or taking of any
>> action in reliance on an email received in error is strictly prohibited.
>>
>
>
> --
> Shaurya Gupta
>
>
>


Re: Cassandra p95 latencies

2023-08-25 Thread Shaurya Gupta
Thanks everyone.
Updating this thread -
We increased the key cache size from 100 MB to 200 MB and we believe that
has brought down the latency from 40 ms p95 to 6 ms p95. I think there is
still scope for improvement as both writes and reads are presently at p95 6
ms. I would expect writes to be lower. But we are good with 6 ms for now at
least.

On Mon, Aug 14, 2023 at 11:56 AM Elliott Sims via user <
user@cassandra.apache.org> wrote:

> 1.  Check for Nagle/delayed-ack, but probably nodelay is getting set by
> the driver so it shouldn't be a problem.
> 2.  Check for network latency (just regular old ping among hosts, during
> traffic)
> 3.  Check your GC metrics and see if garbage collections line up with
> outliers.  Some tuning can help there, depending on the pattern, but 40ms
> p99 at least would be fairly normal for G1GC.
> 4.  Check actual local write times, and I/O times with iostat.  If you
> have spinning drives 40ms is fairly expected.  It's high but not totally
> unexpected for consumer-grade SSDs.  For enterprise-grade SSDs commit times
> that long would be very unusual.  What are your commitlog_sync settings?
>
> On Mon, Aug 14, 2023 at 8:43 AM Josh McKenzie 
> wrote:
>
>> The queries are rightly designed
>>
>> Data modeling in Cassandra is 100% gray space; there unfortunately is no
>> right or wrong design. You'll need to share basic shapes / contours of your
>> data model for other folks to help you; seemingly innocuous things in a
>> data model can cause unexpected issues w/C*'s storage engine paradigm
>> thanks to the partitioning and data storage happening under the hood.
>>
>> If you were seeing single digit ms on 3.0.X or 3.11.X and 40ms p95 on 4.0
>> I'd immediately look to the DB as being the culprit. For all other cases,
>> you should be seeing single digit ms as queries in C* generally boil down
>> to key/value lookups (partition key) to a list of rows you either point
>> query (key/value #2) or range scan via clustering keys and pull back out.
>>
>> There's also paging to take into consideration (whether you're using it
>> or not, what your page size is) and the data itself (do you have thousands
>> of columns? Multi-MB blobs you're pulling back out? etc). All can play into
>> this.
>>
>> On Fri, Aug 11, 2023, at 3:40 PM, Jeff Jirsa wrote:
>>
>> You’re going to have to help us help you
>>
>> 4.0 is pretty widely deployed. I’m not aware of a perf regression
>>
>> Can you give us a schema (anonymized) and queries and show us a trace ?
>>
>>
>> On Aug 10, 2023, at 10:18 PM, Shaurya Gupta 
>> wrote:
>>
>> 
>> The queries are rightly designed as I already explained. 40 ms is way too
>> high as compared to what I seen with other DBs and many a times with
>> Cassandra 3.x versions.
>> CPU consumed as I mentioned is not high, it is around 20%.
>>
>> On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:
>>
>> Hi,
>> P95 should not be a problem if rightly designed. Levelled compaction
>> strategy further reduces this, however it consume some resources. For read,
>> caching is also helpful.
>> Can you check your cpu iowait as it could be the reason for delay
>>
>> Regards,
>> Ashish
>>
>> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta, 
>> wrote:
>>
>> Hi community
>>
>> What is the expected P95 latency for Cassandra Read and Write queries
>> executed with Local_Quorum over a table with 3 replicas ? The queries are
>> done using the partition + clustering key and row size in bytes is not too
>> much, maybe 1-2 KB maximum.
>> Assuming CPU is not a crunch ?
>>
>> We observe those to be 40 ms P95 Reads and same for Writes. This looks
>> very high as compared to what we expected. We are using Cassandra 4.0.
>>
>> Any documentation / numbers will be helpful.
>>
>> Thanks
>> --
>> Shaurya Gupta
>>
>>
>>
>> --
>> Shaurya Gupta
>>
>>
>>
> This email, including its contents and any attachment(s), may contain
> confidential and/or proprietary information and is solely for the review
> and use of the intended recipient(s). If you have received this email in
> error, please notify the sender and permanently delete this email, its
> content, and any attachment(s). Any disclosure, copying, or taking of any
> action in reliance on an email received in error is strictly prohibited.
>


-- 
Shaurya Gupta


Re: Cassandra p95 latencies

2023-08-14 Thread Elliott Sims via user
1.  Check for Nagle/delayed-ack, but probably nodelay is getting set by the
driver so it shouldn't be a problem.
2.  Check for network latency (just regular old ping among hosts, during
traffic)
3.  Check your GC metrics and see if garbage collections line up with
outliers.  Some tuning can help there, depending on the pattern, but 40ms
p99 at least would be fairly normal for G1GC.
4.  Check actual local write times, and I/O times with iostat.  If you have
spinning drives 40ms is fairly expected.  It's high but not totally
unexpected for consumer-grade SSDs.  For enterprise-grade SSDs commit times
that long would be very unusual.  What are your commitlog_sync settings?

On Mon, Aug 14, 2023 at 8:43 AM Josh McKenzie  wrote:

> The queries are rightly designed
>
> Data modeling in Cassandra is 100% gray space; there unfortunately is no
> right or wrong design. You'll need to share basic shapes / contours of your
> data model for other folks to help you; seemingly innocuous things in a
> data model can cause unexpected issues w/C*'s storage engine paradigm
> thanks to the partitioning and data storage happening under the hood.
>
> If you were seeing single digit ms on 3.0.X or 3.11.X and 40ms p95 on 4.0
> I'd immediately look to the DB as being the culprit. For all other cases,
> you should be seeing single digit ms as queries in C* generally boil down
> to key/value lookups (partition key) to a list of rows you either point
> query (key/value #2) or range scan via clustering keys and pull back out.
>
> There's also paging to take into consideration (whether you're using it or
> not, what your page size is) and the data itself (do you have thousands of
> columns? Multi-MB blobs you're pulling back out? etc). All can play into
> this.
>
> On Fri, Aug 11, 2023, at 3:40 PM, Jeff Jirsa wrote:
>
> You’re going to have to help us help you
>
> 4.0 is pretty widely deployed. I’m not aware of a perf regression
>
> Can you give us a schema (anonymized) and queries and show us a trace ?
>
>
> On Aug 10, 2023, at 10:18 PM, Shaurya Gupta 
> wrote:
>
> 
> The queries are rightly designed as I already explained. 40 ms is way too
> high as compared to what I seen with other DBs and many a times with
> Cassandra 3.x versions.
> CPU consumed as I mentioned is not high, it is around 20%.
>
> On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:
>
> Hi,
> P95 should not be a problem if rightly designed. Levelled compaction
> strategy further reduces this, however it consume some resources. For read,
> caching is also helpful.
> Can you check your cpu iowait as it could be the reason for delay
>
> Regards,
> Ashish
>
> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:
>
> Hi community
>
> What is the expected P95 latency for Cassandra Read and Write queries
> executed with Local_Quorum over a table with 3 replicas ? The queries are
> done using the partition + clustering key and row size in bytes is not too
> much, maybe 1-2 KB maximum.
> Assuming CPU is not a crunch ?
>
> We observe those to be 40 ms P95 Reads and same for Writes. This looks
> very high as compared to what we expected. We are using Cassandra 4.0.
>
> Any documentation / numbers will be helpful.
>
> Thanks
> --
> Shaurya Gupta
>
>
>
> --
> Shaurya Gupta
>
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.


Re: Cassandra p95 latencies

2023-08-14 Thread Josh McKenzie
> The queries are rightly designed
Data modeling in Cassandra is 100% gray space; there unfortunately is no right 
or wrong design. You'll need to share basic shapes / contours of your data 
model for other folks to help you; seemingly innocuous things in a data model 
can cause unexpected issues w/C*'s storage engine paradigm thanks to the 
partitioning and data storage happening under the hood.

If you were seeing single digit ms on 3.0.X or 3.11.X and 40ms p95 on 4.0 I'd 
immediately look to the DB as being the culprit. For all other cases, you 
should be seeing single digit ms as queries in C* generally boil down to 
key/value lookups (partition key) to a list of rows you either point query 
(key/value #2) or range scan via clustering keys and pull back out.

There's also paging to take into consideration (whether you're using it or not, 
what your page size is) and the data itself (do you have thousands of columns? 
Multi-MB blobs you're pulling back out? etc). All can play into this.

On Fri, Aug 11, 2023, at 3:40 PM, Jeff Jirsa wrote:
> You’re going to have to help us help you 
> 
> 4.0 is pretty widely deployed. I’m not aware of a perf regression 
> 
> Can you give us a schema (anonymized) and queries and show us a trace ? 
> 
> 
>> On Aug 10, 2023, at 10:18 PM, Shaurya Gupta  wrote:
>> 
>> The queries are rightly designed as I already explained. 40 ms is way too 
>> high as compared to what I seen with other DBs and many a times with 
>> Cassandra 3.x versions.
>> CPU consumed as I mentioned is not high, it is around 20%.
>> 
>> On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:
>>> Hi,
>>> P95 should not be a problem if rightly designed. Levelled compaction 
>>> strategy further reduces this, however it consume some resources. For read, 
>>> caching is also helpful. 
>>> Can you check your cpu iowait as it could be the reason for delay 
>>> 
>>> Regards,
>>> Ashish
>>> 
>>> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:
 Hi community
 
 What is the expected P95 latency for Cassandra Read and Write queries 
 executed with Local_Quorum over a table with 3 replicas ? The queries are 
 done using the partition + clustering key and row size in bytes is not too 
 much, maybe 1-2 KB maximum.
 Assuming CPU is not a crunch ?
 
 We observe those to be 40 ms P95 Reads and same for Writes. This looks 
 very high as compared to what we expected. We are using Cassandra 4.0.
 
 Any documentation / numbers will be helpful.
 
 Thanks
 --
 Shaurya Gupta
 
>> 
>> 
>> --
>> Shaurya Gupta
>> 


Re: Cassandra p95 latencies

2023-08-11 Thread Jeff Jirsa
You’re going to have to help us help you 4.0 is pretty widely deployed. I’m not aware of a perf regression Can you give us a schema (anonymized) and queries and show us a trace ? On Aug 10, 2023, at 10:18 PM, Shaurya Gupta  wrote:The queries are rightly designed as I already explained. 40 ms is way too high as compared to what I seen with other DBs and many a times with Cassandra 3.x versions.CPU consumed as I mentioned is not high, it is around 20%.On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:Hi,P95 should not be a problem if rightly designed. Levelled compaction strategy further reduces this, however it consume some resources. For read, caching is also helpful. Can you check your cpu iowait as it could be the reason for delay Regards,AshishOn Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:Hi communityWhat is the expected P95 latency for Cassandra Read and Write queries executed with Local_Quorum over a table with 3 replicas ? The queries are done using the partition + clustering key and row size in bytes is not too much, maybe 1-2 KB maximum.Assuming CPU is not a crunch ?We observe those to be 40 ms P95 Reads and same for Writes. This looks very high as compared to what we expected. We are using Cassandra 4.0.Any documentation / numbers will be helpful.Thanks-- Shaurya Gupta

-- Shaurya Gupta


RE: Cassandra p95 latencies

2023-08-11 Thread Durity, Sean R via user
I would expect single digit ms latency on reads and writes. However, we have 
not done any performance testing on Apache Cassandra 4.x.

Sean R. Durity


INTERNAL USE
From: Shaurya Gupta 
Sent: Friday, August 11, 2023 1:16 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra p95 latencies

The queries are rightly designed as I already explained. 40 ms is way too high 
as compared to what I seen with other DBs and many a times with Cassandra 3. x 
versions. CPU consumed as I mentioned is not high, it is around 20%. On Thu, 
Aug 10,

The queries are rightly designed as I already explained. 40 ms is way too high 
as compared to what I seen with other DBs and many a times with Cassandra 3.x 
versions.
CPU consumed as I mentioned is not high, it is around 20%.

On Thu, Aug 10, 2023 at 5:14 PM MyWorld 
mailto:timeplus.1...@gmail.com>> wrote:
Hi,
P95 should not be a problem if rightly designed. Levelled compaction strategy 
further reduces this, however it consume some resources. For read, caching is 
also helpful.
Can you check your cpu iowait as it could be the reason for delay

Regards,
Ashish

On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta, 
mailto:shaurya.n...@gmail.com>> wrote:
Hi community

What is the expected P95 latency for Cassandra Read and Write queries executed 
with Local_Quorum over a table with 3 replicas ? The queries are done using the 
partition + clustering key and row size in bytes is not too much, maybe 1-2 KB 
maximum.
Assuming CPU is not a crunch ?

We observe those to be 40 ms P95 Reads and same for Writes. This looks very 
high as compared to what we expected. We are using Cassandra 4.0.

Any documentation / numbers will be helpful.

Thanks
--
Shaurya Gupta



--
Shaurya Gupta




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Cassandra p95 latencies

2023-08-10 Thread Shaurya Gupta
The queries are rightly designed as I already explained. 40 ms is way too
high as compared to what I seen with other DBs and many a times with
Cassandra 3.x versions.
CPU consumed as I mentioned is not high, it is around 20%.

On Thu, Aug 10, 2023 at 5:14 PM MyWorld  wrote:

> Hi,
> P95 should not be a problem if rightly designed. Levelled compaction
> strategy further reduces this, however it consume some resources. For read,
> caching is also helpful.
> Can you check your cpu iowait as it could be the reason for delay
>
> Regards,
> Ashish
>
> On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:
>
>> Hi community
>>
>> What is the expected P95 latency for Cassandra Read and Write queries
>> executed with Local_Quorum over a table with 3 replicas ? The queries are
>> done using the partition + clustering key and row size in bytes is not too
>> much, maybe 1-2 KB maximum.
>> Assuming CPU is not a crunch ?
>>
>> We observe those to be 40 ms P95 Reads and same for Writes. This looks
>> very high as compared to what we expected. We are using Cassandra 4.0.
>>
>> Any documentation / numbers will be helpful.
>>
>> Thanks
>> --
>> Shaurya Gupta
>>
>>
>>

-- 
Shaurya Gupta


Re: Cassandra p95 latencies

2023-08-10 Thread Abe Ratnofsky
40ms is definitely higher than expected. Have you run your queries with TRACING enabled to see where the latency is coming from?https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshTracing.html40ms is also a fairly specific duration: https://eklitzke.org/the-caveats-of-tcp-nodelay> On Linux this can cause up to a 40 ms delay when acking packets—AbeOn Aug 10, 2023, at 17:13, MyWorld  wrote:Hi,P95 should not be a problem if rightly designed. Levelled compaction strategy further reduces this, however it consume some resources. For read, caching is also helpful. Can you check your cpu iowait as it could be the reason for delay Regards,AshishOn Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:Hi communityWhat is the expected P95 latency for Cassandra Read and Write queries executed with Local_Quorum over a table with 3 replicas ? The queries are done using the partition + clustering key and row size in bytes is not too much, maybe 1-2 KB maximum.Assuming CPU is not a crunch ?We observe those to be 40 ms P95 Reads and same for Writes. This looks very high as compared to what we expected. We are using Cassandra 4.0.Any documentation / numbers will be helpful.Thanks-- Shaurya Gupta



Re: Cassandra p95 latencies

2023-08-10 Thread MyWorld
Hi,
P95 should not be a problem if rightly designed. Levelled compaction
strategy further reduces this, however it consume some resources. For read,
caching is also helpful.
Can you check your cpu iowait as it could be the reason for delay

Regards,
Ashish

On Fri, 11 Aug, 2023, 04:58 Shaurya Gupta,  wrote:

> Hi community
>
> What is the expected P95 latency for Cassandra Read and Write queries
> executed with Local_Quorum over a table with 3 replicas ? The queries are
> done using the partition + clustering key and row size in bytes is not too
> much, maybe 1-2 KB maximum.
> Assuming CPU is not a crunch ?
>
> We observe those to be 40 ms P95 Reads and same for Writes. This looks
> very high as compared to what we expected. We are using Cassandra 4.0.
>
> Any documentation / numbers will be helpful.
>
> Thanks
> --
> Shaurya Gupta
>
>
>


Cassandra p95 latencies

2023-08-10 Thread Shaurya Gupta
Hi community

What is the expected P95 latency for Cassandra Read and Write queries
executed with Local_Quorum over a table with 3 replicas ? The queries are
done using the partition + clustering key and row size in bytes is not too
much, maybe 1-2 KB maximum.
Assuming CPU is not a crunch ?

We observe those to be 40 ms P95 Reads and same for Writes. This looks very
high as compared to what we expected. We are using Cassandra 4.0.

Any documentation / numbers will be helpful.

Thanks
-- 
Shaurya Gupta