[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Dinesh Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827478#comment-16827478
 ] 

Dinesh Joshi edited comment on CASSANDRA-15066 at 4/27/19 6:30 AM:
---

[~ifesdjeen] thank you for the notes. They're very useful. I'm going over the 
code, primarily for my own curiosity.

[~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is 
there a jira for the read perf issue? It would be great to have it linked here, 
assuming we want to fix it part of this patch.


was (Author: djoshi3):
[~ifesdjeen] thank you for the notes. They're very useful. I'm going over the 
code, primarily for my own curiosity.

[~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is 
there a jira for the read perf issue? It would be great to have it linked here 
for posterity, assuming we want to fix it part of this patch.

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
> Attachments: 20k_backfill.png, 60k_RPS.png, 
> 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, 
> baseline_patch_vs_30x.png, increasing_reads_latency.png, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Dinesh Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827478#comment-16827478
 ] 

Dinesh Joshi commented on CASSANDRA-15066:
--

[~ifesdjeen] thank you for the notes. They're very useful. I'm going over the 
code, primarily for my own curiosity.

[~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is 
there a jira for the read perf issue? It would be great to have it linked here 
for posterity, assuming we want to fix it part of this patch.

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
> Attachments: 20k_backfill.png, 60k_RPS.png, 
> 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, 
> baseline_patch_vs_30x.png, increasing_reads_latency.png, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395
 ] 

Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:54 AM:


Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

[^20k_backfill.png] 
[^backfill_cass_perf_ft_msg_tst.svg] 

*Second test, establish baseline*

Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is 
very light load, and compared this patch to our production 30x production 
branch.
 * Writes are ~20%faster, like we saw previously in netty trunk vs 30x
 * Reads are *~500%* slower, this is new since our last tests and from the 
flamegraph [~benedict] suspects and I agree that it was likely related to some 
of the TR cleanup
 * Checked the virtual table metrics and they seem reasonable, also spot 
checked some of the new jmx per channel metrics

Summary: The read latency is concerning, but I think Benedict may already have 
the fix.

[^baseline_patch_vs_30x.png] 

*Third test, punish with reads*

Due to the poor baseline read performance, we attempted to push the reads as 
far as they would go while acquiring a flamegraph for debugging where we are 
spending time.
 * We were able to push the cluster to 60,000 coordinator RPS before we started 
seeing CPU queuing.
 * The limiting factor appeared to be on CPU time (~about 80% saturated) and 
random 4k IOP speed (although we were only ~30% saturated there)
 * Flamegraphs are attached

tpstats showed relatively little queueing or QOS issues, and local read 
latencies remained fast, so we believe that there is a different issue at play 
in the read path. Flamegraphs are attached for debugging.

[^increasing_reads_latency.png]  
[^60k_RPS_CPU_bottleneck.png]
[^60k_RPS.png]
[^many_reads_cass_perf_ft_msg_tst.svg]

*Fourth test, punish with reads and writes*

We're currently attempting a mixed mode test where we do many reads and writes 
and see how they interact. Results will be posted shortly. I think we'll need 
to bump our branch to pickup the latest changes.

*Summary*

So far this patch looks to be doing a great job, we have some issues to figure 
out with the reads and many more tests to run, but it didn't explode so that is 
good heh.

 


was (Author: jolynch):
Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS e

[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395
 ] 

Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:50 AM:


Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

!20k_backfill.png|thumbnail!

*Second test, establish baseline*

Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is 
very light load, and compared this patch to our production 30x production 
branch.
 * Writes are ~20%faster, like we saw previously in netty trunk vs 30x
 * Reads are *~500%* slower, this is new since our last tests and from the 
flamegraph [~benedict] suspects and I agree that it was likely related to some 
of the TR cleanup
 * Checked the virtual table metrics and they seem reasonable, also spot 
checked some of the new jmx per channel metrics

Summary: The read latency is concerning, but I think Benedict may already have 
the fix.

*Third test, punish with reads*

Due to the poor baseline read performance, we attempted to push the reads as 
far as they would go while acquiring a flamegraph for debugging where we are 
spending time.
 * We were able to push the cluster to 60,000 coordinator RPS before we started 
seeing CPU queuing.
 * The limiting factor appeared to be on CPU time (~about 80% saturated) and 
random 4k IOP speed (although we were only ~30% saturated there)
 * Flamegraphs are attached

tpstats showed relatively little queueing or QOS issues, and local read 
latencies remained fast, so we believe that there is a different issue at play 
in the read path. Flamegraphs are attached for debugging.

*Fourth test, punish with reads and writes*

We're currently attempting a mixed mode test where we do many reads and writes 
and see how they interact. Results will be posted shortly. I think we'll need 
to bump our branch to pickup the latest changes.

*Summary*

So far this patch looks to be doing a great job, we have some issues to figure 
out with the reads and many more tests to run, but it didn't explode so that is 
good heh.

 


was (Author: jolynch):
Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped mes

[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395
 ] 

Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:47 AM:


Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

*Second test, establish baseline*

Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is 
very light load, and compared this patch to our production 30x production 
branch.
 * Writes are ~20%faster, like we saw previously in netty trunk vs 30x
 * Reads are *~500%* slower, this is new since our last tests and from the 
flamegraph [~benedict] suspects and I agree that it was likely related to some 
of the TR cleanup
 * Checked the virtual table metrics and they seem reasonable, also spot 
checked some of the new jmx per channel metrics

Summary: The read latency is concerning, but I think Benedict may already have 
the fix.

*Third test, punish with reads*

Due to the poor baseline read performance, we attempted to push the reads as 
far as they would go while acquiring a flamegraph for debugging where we are 
spending time.
 * We were able to push the cluster to 60,000 coordinator RPS before we started 
seeing CPU queuing.
 * The limiting factor appeared to be on CPU time (~about 80% saturated) and 
random 4k IOP speed (although we were only ~30% saturated there)
 * Flamegraphs are attached

tpstats showed relatively little queueing or QOS issues, and local read 
latencies remained fast, so we believe that there is a different issue at play 
in the read path. Flamegraphs are attached for debugging.

*Fourth test, punish with reads and writes*

We're currently attempting a mixed mode test where we do many reads and writes 
and see how they interact. Results will be posted shortly. I think we'll need 
to bump our branch to pickup the latest changes.

*Summary*

So far this patch looks to be doing a great job, we have some issues to figure 
out with the reads and many more tests to run, but it didn't explode so that is 
good heh.


was (Author: jolynch):
Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
data

[jira] [Updated] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15066:
-
Attachment: increasing_reads_latency.png
60k_RPS_CPU_bottleneck.png
20k_backfill.png
baseline_patch_vs_30x.png
60k_RPS.png

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
> Attachments: 20k_backfill.png, 60k_RPS.png, 
> 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, 
> baseline_patch_vs_30x.png, increasing_reads_latency.png, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15066:
-
Attachment: backfill_cass_perf_ft_msg_tst.svg
many_reads_cass_perf_ft_msg_tst.svg

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
> Attachments: backfill_cass_perf_ft_msg_tst.svg, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395
 ] 

Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:43 AM:


Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

*Second test, establish baseline*

Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is 
very light load, and compared this patch to our production 30x production 
branch.
 * Writes are ~20%faster, like we saw previously in netty trunk vs 30x
 * Reads are *~500%* slower, this is new since our last tests and from the 
flamegraph [~benedict] suspects and I agree that it was likely related to some 
of the TR cleanup
 * Checked the virtual table metrics and they seem reasonable, also spot 
checked some of the new jmx per channel metrics

Summary: The read latency is concerning, but I think Benedict may already have 
the fix.

*Third test, punish with reads*

Due to the poor baseline read performance, we attempted to push the reads as 
far as they would go while acquiring a flamegraph for debugging where we are 
spending time.
 * We were able to push the cluster to 60,000 coordinator RPS before we started 
seeing CPU queuing.
 * Flamegraphs are attached

tpstats showed relatively little queueing or QOS issues, and local read 
latencies remained fast, so we believe that there is a different issue at play 
in the read path. Flamegraphs are attached for debugging.

*Fourth test, punish with reads and writes*

We're currently attempting a mixed mode test where we do many reads and writes 
and see how they interact. Results will be posted shortly. I think we'll need 
to bump our branch to pickup the latest changes.

*Summary*

So far this patch looks to be doing a great job, we have some issues to figure 
out with the reads and many more tests to run, but it didn't explode so that is 
good heh.


was (Author: jolynch):
Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }}random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

**In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

*Second test, establish baseline*

**Next, we sent a reasonably modest 12

[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395
 ] 

Joseph Lynch commented on CASSANDRA-15066:
--

Note, this is not a comparative analysis nor do we have root causes for all 
findings. [~vinaykumarcse] and my goal today was to kick the tires of this 
patch and see if there were any serious issues. We threw 
[{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293]
 on a smallish cluster and punished it with read and write load.

*Test setup*
 * Two datacenters, approximately 70ms apart
 * 6 {{{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 
60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network)
 * 3 node NDBench cluster generating {{LOCAL_ONE }}random writes and full 
partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. 
Total dataset per node was about ~180GiB and reads were uniformly distributed 
across the partition space. The table was mostly defaults (RF=3) except it used 
Leveled Compaction Strategy and no compression (since the data is random).

*First test, bootstrap (aka punish with writes)*

**In this test we used NDBench's backfill feature to punish the cluster with 
writes.
 * Backfilling the dataset achieved sustained write throughput of 20k 
coordinator level WPS easily, with average latencies staying below 1ms
 * The limiting factor appeared to be compaction throughput
 * Flamegraphs are attached

There were no observed hints or dropped messages and datasizes in both 
datacenters looked reasonably consistent. I think this went very well.

*Second test, establish baseline*

**Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is 
very light load, and compared this patch to our production 30x production 
branch.
 * Writes are ~20%faster, like we saw previously in netty trunk vs 30x
 * Reads are *~500%* slower, this is new since our last tests and from the 
flamegraph [~benedict] suspects and I agree that it was likely related to some 
of the TR cleanup
 * Checked the virtual table metrics and they seem reasonable, also spot 
checked some of the new jmx per channel metrics

Summary: The read latency is concerning, but I think Benedict may already have 
the fix.

*Third test, punish with reads*

Due to the poor baseline read performance, we attempted to push the reads as 
far as they would go while acquiring a flamegraph for debugging where we are 
spending time.
 * We were able to push the cluster to 60,000 coordinator RPS before we started 
seeing CPU queuing.
 * Flamegraphs are attached

tpstats showed relatively little queueing or QOS issues, and local read 
latencies remained fast, so we believe that there is a different issue at play 
in the read path. Flamegraphs are attached for debugging.

*Fourth test, punish with reads and writes*

**We're currently attempting a mixed mode test where we do many reads and 
writes and see how they interact. Results will be posted shortly. I think we'll 
need to bump our branch to pickup the latest changes.

*Summary*

So far this patch looks to be doing a great job, we have some issues to figure 
out with the reads and many more tests to run, but it didn't explode so that is 
good heh.

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/c

[jira] [Updated] (CASSANDRA-13485) Better handle IO errors on 3.0+ flat files

2019-04-26 Thread Jeff Jirsa (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13485:
---
Resolution: Duplicate
Status: Resolved  (was: Open)

> Better handle IO errors on 3.0+ flat files 
> ---
>
> Key: CASSANDRA-13485
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13485
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging, Local/Compaction, 
> Local/Startup and Shutdown
>Reporter: Jeff Jirsa
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> In 3.0, hints and compaction transaction data both move into flat files. Like 
> every other part of cassandra, we can have IO errors either reading or 
> writing those files, and should properly handle IO exceptions on those files 
> (including respecting the disk failure policies).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827042#comment-16827042
 ] 

Benedict edited comment on CASSANDRA-14654 at 4/26/19 3:17 PM:
---

I personally like it if the name of the thing affected comes first, so that the 
properties are easily discoverable, but otherwise that sounds good.  So maybe 
{{key_cache_maintain_during_compaction}}?

(and, tbh, +1 any name that has the {{key_cache_}} prefix)


was (Author: benedict):
I personally like it if the name of the thing affected comes first, so that the 
properties are easily discoverable, but otherwise that sounds good.  So maybe 
{{key_cache_maintain_during_compaction}}?

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Benedict (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-14654:
-
Status: Ready to Commit  (was: Review In Progress)

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827042#comment-16827042
 ] 

Benedict commented on CASSANDRA-14654:
--

I personally like it if the name of the thing affected comes first, so that the 
properties are easily discoverable, but otherwise that sounds good.  So maybe 
{{key_cache_maintain_during_compaction}}?

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827024#comment-16827024
 ] 

Chris Lohfink edited comment on CASSANDRA-14654 at 4/26/19 2:52 PM:


i liked changes (applied to branch), perhaps 
{{maintain_keycache_during_compaction}} for better name?


was (Author: cnlwsu):
i liked changes, perhaps {{maintain_keycache_during_compaction}} for better 
name?

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-14654:
--
Status: Review In Progress  (was: Changes Suggested)

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14654) Reduce heap pressure during compactions

2019-04-26 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827024#comment-16827024
 ] 

Chris Lohfink commented on CASSANDRA-14654:
---

i liked changes, perhaps {{maintain_keycache_during_compaction}} for better 
name?

> Reduce heap pressure during compactions
> ---
>
> Key: CASSANDRA-14654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Normal
>  Labels: Performance, pull-request-available
> Fix For: 4.x
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Small partition compactions are painfully slow with a lot of overhead per 
> partition. There also tends to be an excess of objects created (ie 
> 200-700mb/s) per compaction thread.
> The EncoderStats walks through all the partitions and with mergeWith it will 
> create a new one per partition as it walks the potentially millions of 
> partitions. In a test scenario of about 600byte partitions and a couple 100mb 
> of data this consumed ~16% of the heap pressure. Changing this to instead 
> mutably track the min values and create one in a EncodingStats.Collector 
> brought this down considerably (but not 100% since the 
> UnfilteredRowIterator.stats() still creates 1 per partition).
> The KeyCacheKey makes a full copy of the underlying byte array in 
> ByteBufferUtil.getArray in its constructor. This is the dominating heap 
> pressure as there are more sstables. By changing this to just keeping the 
> original it completely eliminates the current dominator of the compactions 
> and also improves read performance.
> Minor tweak included for this as well for operators when compactions are 
> behind on low read clusters is to make the preemptive opening setting a 
> hotprop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

2019-04-26 Thread Alex Petrov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826823#comment-16826823
 ] 

Alex Petrov commented on CASSANDRA-15066:
-

As was discussed on the mailing list, publishing my reviewer notes that might 
be helpful for anyone who would like to get an overview of the patch and have 
some guidance to save time.

First of all, there are many changes that are easy to review. These are either 
renames, moved, extracted classes or moved functionality:

  * Some request classes were renamed, such as {{Truncation -> 
TruncateRequest}}, {{EncodedHintMessage -> HintMessage}} etc.
  * {{Message}} class now got {{id}} so we do not have to pass additional 
{{id}} param everywhere. It also combines {{MessageIn}} and {{MessageOut}}, 
which is also quite a large change.
  * Quite a few changes are related to adding {{TimeUnit}}, which are all 
cosmetic but still important
  * One of the largest changes is splitting out {{Verb}}, {{MBean}} methods and 
fields and several other classes out from {{MessagingService}}
  * Metrics and virtual tables (also quite large but easy to review and verify)

Now, moving to some important changes. Most of them are around net.async 
package.

Two entrypoint classes that handle incoming and outgoing messages and 
connections are {{OutboundConnection(s)}} and {{InboundMessageHandler(s)}}.

{{InboundMessageHandlers}} is a class that holds handler instances for 
{{small}}, {{large}}, {{urgent}} and {{counter}} messages. Inbound connections 
are created and initialised using {{InboundConnectionInitiator}}, which handles 
netty pipeline creation in {{#initChannel}}. It is responsible for creating an 
appropriate messaging pipeline (either messaging or streaming one).

{{InboundConnectionInitiator#Handler}} inner class is responsible for accepting 
message, decoding it and initiating the actual connection, which is done by 
receiving a handshake message ({{HandshakeProtocol.Initiate}}). After this 
message is received, we check whether or not the remote node version is 
compatible with current one and proceed with setting up an appropriate 
pipeline. {{Handler#decode}} contains handshake logic for current and older 
versions. {{HandshakeProtocol}} class contains a complete description of new 
and old handshake protocols, message formats and states.

{{InboundMessageHandler}} also contains logic for handling large message. If 
message size exceeds large message threshold, LargeCoprocessor comes into play. 
It starts a side thread that handles feeds frame contents into 
{{AsyncMessagingInputPlus}}, which calls {{#onBufConsumed}} as soon as it is 
done. You can find a complete description in class javadoc.

Lastly, {{InboundMessageHandler}} handles resource limits and message dropping. 
You can find acquisition logic in {{processOneContainedMessage}}, expiration in 
callbacks.onArrivedExpired and {{WaitQueue#schedule}} logic.

OutboundConnections has many similarities with inbound path. It also groups 
small, large and urgent messages into their own {{OutboundConnection}} 
instances. OutboundConnectionInitiator creates and initialises outgoing 
(messaging and streaming) connections, takes care of Netty specifics, setting 
up pipelines and creating handlers. Similar to incoming path, it uses 
{{HandshakeProtocol}}. {{OutboundConnections#Connect}} class handles retries 
and scheduling of auxiliary tasks (for example, pruning expired messages, see 
{{#scheduleMaintenanceWhileDisconnected}}).

OutboundConnection handles capacity limits (see {{#enqueue}} and 
{{#acquireCapacity}}) and puts messages to {{OutboundMessageQueue}} if resource 
limits allow.

Two subclasses of Delivery ({{EventLoop|LargeMessageDelivery}}) are responsible 
for polling the queue (see {{#doRun}}). {{LargeMessageDelivery}} uses 
{{AsyncMessagingOutputPlus}}, which buffers data, encodes it to frames and 
asynchronously flushes them when buffer fills up.

Both paths have configurations, located in 
{{Inbound|OutboundConnectionSettings}}.

There is already some documentation in place in javadoc and by the time the 
patch is committed all important pieces will be documented. If you see / think 
documentation is missing for some piece of functionality, share your thoughts.

> Improvements to Internode Messaging
> ---
>
> Key: CASSANDRA-15066
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Internode
>Reporter: Benedict
>Assignee: Benedict
>Priority: High
> Fix For: 4.0
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are th

[jira] [Commented] (CASSANDRA-10190) Python 3 support for cqlsh

2019-04-26 Thread Dinesh Joshi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826742#comment-16826742
 ] 

Dinesh Joshi commented on CASSANDRA-10190:
--

[~ptbannister] I found a few issues and fixed them. Here's a branch with the 
changes and rebased on the current trunk - 
https://github.com/dineshjoshi/cassandra/tree/10190-review

> Python 3 support for cqlsh
> --
>
> Key: CASSANDRA-10190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Tools
>Reporter: Andrew Pennebaker
>Assignee: Patrick Bannister
>Priority: Normal
>  Labels: cqlsh
> Attachments: coverage_notes.txt
>
>
> Users who operate in a Python 3 environment may have trouble launching cqlsh. 
> Could we please update cqlsh's syntax to run in Python 3?
> As a workaround, users can setup pyenv, and cd to a directory with a 
> .python-version containing "2.7". But it would be nice if cqlsh supported 
> modern Python versions out of the box.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15100) Improve no-op cleanup performance

2019-04-26 Thread Marcus Eriksson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15100:

Test and Documentation Plan: adds a unit test to make sure we skip sstables 
correctly
 Status: Patch Available  (was: Open)

Patch to filter the sstables earlier

||Branch||Tests||
|[trunk|https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15100-trunk]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-trunk]|
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...krummas:marcuse/15100-3.11]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-3.11]|
|[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...krummas:marcuse/15100-3.0]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-3.0]|

> Improve no-op cleanup performance
> -
>
> Key: CASSANDRA-15100
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15100
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should filter sstables in `OneSSTableOperation#filterSSTables` instead of 
> in the cleanup method to avoid creating unnecessary single-sstable 
> transactions for sstables fully contained in the owned ranges.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org