[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2021-01-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17273005#comment-17273005
 ] 

Brandon Williams commented on CASSANDRA-16078:
--

I'm good with closing.

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: ClusteringSlicing.kt, async-profile-3.0.19-3.svg, 
> async-profile-4.0.0-3.svg, image.png, latency_selects_3_4.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2021-01-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272538#comment-17272538
 ] 

Caleb Rackliffe commented on CASSANDRA-16078:
-

I ramped up the 6-node tests to something closer to 40% (rather than 5%) CPU 
usage, and the story hasn't changed much, if at all. 4.0/trunk is better at the 
50th, 90th, and 99th percentiles for reads, writes, and deletes.

[~dcapwell] [~brandon.williams] Any objections to closing at this point?

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: ClusteringSlicing.kt, async-profile-3.0.19-3.svg, 
> async-profile-4.0.0-3.svg, image.png, latency_selects_3_4.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2021-01-26 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272446#comment-17272446
 ] 

Caleb Rackliffe commented on CASSANDRA-16078:
-

I took the scenario [~dcapwell] ran and made a couple small adjustments:

1.) The {{disk_access_mode}} in the original test was {{mmap}}, and that 
probably doesn't make sense with a 64 KB chunk size. That was changed to 
{{mmap_index_only}}.
2.) The original test flushed the Memtable every 500 milliseconds, and I've 
removed that behavior.

Also, for clarity, the {{tlp-stress}} run does pre-populate 500 (1000-row) 
partitions before starting its warmup. Finally, the test outlined above was 
running with the stress argument {{--rate 800}}, which I've either continued to 
use or scaled down to 400 for single-node tests.

I was not able to find any meaningful regression around select latency between 
3.0/3.11 and 4.0/trunk. For both single-node and six-node tests (w/ RF=3), 
4.0/trunk actually produces a better mean latency and latencies up to at least 
the 99.9th percentile. In the example below, {{cie-pushbutton-123}} is a 3.0 
cluster (w/ CASSANDRA-8180 back-ported from 3.x) and {{cie-pushbutton-122}} is 
a 4.0/trunk cluster:

 !latency_selects_3_4.png! 

This chart corresponds to the 6-node cluster comparison, but the single-node 
case looks almost exactly the same, so I've omitted it for now. Flamegraphs 
sampled from individual nodes are very similar between the versions.

 [^async-profile-3.0.19-3.svg] 
 [^async-profile-4.0.0-3.svg] 

The only minor oddity is 4.0/trunk spending a bit more time in {{SEPWorker}} 
wait spins. Looking at the CPU usage, things are mostly idle, so before I move 
to close this, a test with more realistic load might be appropriate...

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: ClusteringSlicing.kt, async-profile-3.0.19-3.svg, 
> async-profile-4.0.0-3.svg, image.png, latency_selects_3_4.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2021-01-14 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265130#comment-17265130
 ] 

Caleb Rackliffe commented on CASSANDRA-16078:
-

I'm planning on taking a look at this starting early next week at the latest.

Given that everything here seems partition-restricted, I'm inclined to strip 
things down to some single node read comparisons, and I have David's original 
test scripts at my disposal...

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Caleb Rackliffe
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: ClusteringSlicing.kt, image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2021-01-12 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263369#comment-17263369
 ] 

Brandon Williams commented on CASSANDRA-16078:
--

I've been unable to find the cause of this, my lab environment just doesn't 
have enough resolution on this issue to be able to accomplish a trustworthy 
bisect.

Attaching the tlp-stress class I've been using.  I've done/do a few things to 
make this easier to test: removed the large cumbersome bliobs, do a warmup run, 
and then a run of pure reads (readrate 1) to eliminate any noise caused by 
mutations/deletes for a few minutes (since with only 1k records, an hour seems 
like overkill.)

[~dcapwell] could you perhaps try this class and confirm that it reproduces so 
at least we'll know if I was totally off base?

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: ClusteringSlicing.kt, image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2020-12-02 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242752#comment-17242752
 ] 

David Capwell commented on CASSANDRA-16078:
---

used tlp-stress but can't share the test code, though can explain it enough to 
reproduce.

// create
{code}
CREATE TABLE IF NOT EXISTS clusteringslicing (
  key text,
  column1 bigint,
  value   blob,
  PRIMARY KEY(key, column1)
) WITH CLUSTERING ORDER BY (column1 DESC)
{code}

// select
{code}
SELECT key, column1
 FROM clusteringslicing
 WHERE key = ?
   AND column1 >= ?
   AND column1 <= ?
 LIMIT ?
{code}

// test configs
{code}
limit == 100
max_blob_length=102400
rows_per_partition=1000
use_range_tombstones=false
LeveledCompactionStrategy
num_partitions=1000
duration="60m"
delete_rate=0.11
read_rate=0.11
{code}

when selecting the clustering key bounds, the test uses a uniform distribution 
between 0 and rows_per_partition

Hope this helps!

Its been a while since I ran the test, if desired I can rerun again to see if 
anything has changed.

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2020-12-02 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242434#comment-17242434
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16078:
-

Hey [~dcapwell], can you, please, share your test to reproduce the issue? 
I am not sure which benchmark you are talking about.

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2020-11-16 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232802#comment-17232802
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16078:
-

A few things are already back on the table today and  I am off next week so I 
am unassigning it as not actively working on it, in case anyone has more time 
now to take it sooner than me. 

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2020-11-10 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229275#comment-17229275
 ] 

Ekaterina Dimitrova commented on CASSANDRA-16078:
-

I am interested to look into this, assigning and I can start looking into it in 
the next days

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org