[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827478#comment-16827478 ] Dinesh Joshi edited comment on CASSANDRA-15066 at 4/27/19 6:30 AM: --- [~ifesdjeen] thank you for the notes. They're very useful. I'm going over the code, primarily for my own curiosity. [~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is there a jira for the read perf issue? It would be great to have it linked here, assuming we want to fix it part of this patch. was (Author: djoshi3): [~ifesdjeen] thank you for the notes. They're very useful. I'm going over the code, primarily for my own curiosity. [~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is there a jira for the read perf issue? It would be great to have it linked here for posterity, assuming we want to fix it part of this patch. > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > Attachments: 20k_backfill.png, 60k_RPS.png, > 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, > baseline_patch_vs_30x.png, increasing_reads_latency.png, > many_reads_cass_perf_ft_msg_tst.svg > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827478#comment-16827478 ] Dinesh Joshi commented on CASSANDRA-15066: -- [~ifesdjeen] thank you for the notes. They're very useful. I'm going over the code, primarily for my own curiosity. [~jolynch] [~vinaykumarcse] thanks for testing the patch and the results. Is there a jira for the read perf issue? It would be great to have it linked here for posterity, assuming we want to fix it part of this patch. > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > Attachments: 20k_backfill.png, 60k_RPS.png, > 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, > baseline_patch_vs_30x.png, increasing_reads_latency.png, > many_reads_cass_perf_ft_msg_tst.svg > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ] Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:54 AM: Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. [^20k_backfill.png] [^backfill_cass_perf_ft_msg_tst.svg] *Second test, establish baseline* Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. [^baseline_patch_vs_30x.png] *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * The limiting factor appeared to be on CPU time (~about 80% saturated) and random 4k IOP speed (although we were only ~30% saturated there) * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. [^increasing_reads_latency.png] [^60k_RPS_CPU_bottleneck.png] [^60k_RPS.png] [^many_reads_cass_perf_ft_msg_tst.svg] *Fourth test, punish with reads and writes* We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. was (Author: jolynch): Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS e
[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ] Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:50 AM: Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. !20k_backfill.png|thumbnail! *Second test, establish baseline* Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * The limiting factor appeared to be on CPU time (~about 80% saturated) and random 4k IOP speed (although we were only ~30% saturated there) * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. *Fourth test, punish with reads and writes* We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. was (Author: jolynch): Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped mes
[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ] Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:47 AM: Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. *Second test, establish baseline* Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * The limiting factor appeared to be on CPU time (~about 80% saturated) and random 4k IOP speed (although we were only ~30% saturated there) * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. *Fourth test, punish with reads and writes* We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. was (Author: jolynch): Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both data
[jira] [Updated] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15066: - Attachment: increasing_reads_latency.png 60k_RPS_CPU_bottleneck.png 20k_backfill.png baseline_patch_vs_30x.png 60k_RPS.png > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > Attachments: 20k_backfill.png, 60k_RPS.png, > 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, > baseline_patch_vs_30x.png, increasing_reads_latency.png, > many_reads_cass_perf_ft_msg_tst.svg > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15066: - Attachment: backfill_cass_perf_ft_msg_tst.svg many_reads_cass_perf_ft_msg_tst.svg > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > Attachments: backfill_cass_perf_ft_msg_tst.svg, > many_reads_cass_perf_ft_msg_tst.svg > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ] Joseph Lynch edited comment on CASSANDRA-15066 at 4/27/19 12:43 AM: Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }} random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. *Second test, establish baseline* Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. *Fourth test, punish with reads and writes* We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. was (Author: jolynch): Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }}random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* **In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. *Second test, establish baseline* **Next, we sent a reasonably modest 12
[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ] Joseph Lynch commented on CASSANDRA-15066: -- Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }}random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* **In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. *Second test, establish baseline* **Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. *Fourth test, punish with reads and writes* **We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/c
[jira] [Updated] (CASSANDRA-13485) Better handle IO errors on 3.0+ flat files
[ https://issues.apache.org/jira/browse/CASSANDRA-13485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-13485: --- Resolution: Duplicate Status: Resolved (was: Open) > Better handle IO errors on 3.0+ flat files > --- > > Key: CASSANDRA-13485 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13485 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Streaming and Messaging, Local/Compaction, > Local/Startup and Shutdown >Reporter: Jeff Jirsa >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > In 3.0, hints and compaction transaction data both move into flat files. Like > every other part of cassandra, we can have IO errors either reading or > writing those files, and should properly handle IO exceptions on those files > (including respecting the disk failure policies). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827042#comment-16827042 ] Benedict edited comment on CASSANDRA-14654 at 4/26/19 3:17 PM: --- I personally like it if the name of the thing affected comes first, so that the properties are easily discoverable, but otherwise that sounds good. So maybe {{key_cache_maintain_during_compaction}}? (and, tbh, +1 any name that has the {{key_cache_}} prefix) was (Author: benedict): I personally like it if the name of the thing affected comes first, so that the properties are easily discoverable, but otherwise that sounds good. So maybe {{key_cache_maintain_during_compaction}}? > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-14654: - Status: Ready to Commit (was: Review In Progress) > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827042#comment-16827042 ] Benedict commented on CASSANDRA-14654: -- I personally like it if the name of the thing affected comes first, so that the properties are easily discoverable, but otherwise that sounds good. So maybe {{key_cache_maintain_during_compaction}}? > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827024#comment-16827024 ] Chris Lohfink edited comment on CASSANDRA-14654 at 4/26/19 2:52 PM: i liked changes (applied to branch), perhaps {{maintain_keycache_during_compaction}} for better name? was (Author: cnlwsu): i liked changes, perhaps {{maintain_keycache_during_compaction}} for better name? > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-14654: -- Status: Review In Progress (was: Changes Suggested) > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827024#comment-16827024 ] Chris Lohfink commented on CASSANDRA-14654: --- i liked changes, perhaps {{maintain_keycache_during_compaction}} for better name? > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Normal > Labels: Performance, pull-request-available > Fix For: 4.x > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 40m > Remaining Estimate: 0h > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging
[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826823#comment-16826823 ] Alex Petrov commented on CASSANDRA-15066: - As was discussed on the mailing list, publishing my reviewer notes that might be helpful for anyone who would like to get an overview of the patch and have some guidance to save time. First of all, there are many changes that are easy to review. These are either renames, moved, extracted classes or moved functionality: * Some request classes were renamed, such as {{Truncation -> TruncateRequest}}, {{EncodedHintMessage -> HintMessage}} etc. * {{Message}} class now got {{id}} so we do not have to pass additional {{id}} param everywhere. It also combines {{MessageIn}} and {{MessageOut}}, which is also quite a large change. * Quite a few changes are related to adding {{TimeUnit}}, which are all cosmetic but still important * One of the largest changes is splitting out {{Verb}}, {{MBean}} methods and fields and several other classes out from {{MessagingService}} * Metrics and virtual tables (also quite large but easy to review and verify) Now, moving to some important changes. Most of them are around net.async package. Two entrypoint classes that handle incoming and outgoing messages and connections are {{OutboundConnection(s)}} and {{InboundMessageHandler(s)}}. {{InboundMessageHandlers}} is a class that holds handler instances for {{small}}, {{large}}, {{urgent}} and {{counter}} messages. Inbound connections are created and initialised using {{InboundConnectionInitiator}}, which handles netty pipeline creation in {{#initChannel}}. It is responsible for creating an appropriate messaging pipeline (either messaging or streaming one). {{InboundConnectionInitiator#Handler}} inner class is responsible for accepting message, decoding it and initiating the actual connection, which is done by receiving a handshake message ({{HandshakeProtocol.Initiate}}). After this message is received, we check whether or not the remote node version is compatible with current one and proceed with setting up an appropriate pipeline. {{Handler#decode}} contains handshake logic for current and older versions. {{HandshakeProtocol}} class contains a complete description of new and old handshake protocols, message formats and states. {{InboundMessageHandler}} also contains logic for handling large message. If message size exceeds large message threshold, LargeCoprocessor comes into play. It starts a side thread that handles feeds frame contents into {{AsyncMessagingInputPlus}}, which calls {{#onBufConsumed}} as soon as it is done. You can find a complete description in class javadoc. Lastly, {{InboundMessageHandler}} handles resource limits and message dropping. You can find acquisition logic in {{processOneContainedMessage}}, expiration in callbacks.onArrivedExpired and {{WaitQueue#schedule}} logic. OutboundConnections has many similarities with inbound path. It also groups small, large and urgent messages into their own {{OutboundConnection}} instances. OutboundConnectionInitiator creates and initialises outgoing (messaging and streaming) connections, takes care of Netty specifics, setting up pipelines and creating handlers. Similar to incoming path, it uses {{HandshakeProtocol}}. {{OutboundConnections#Connect}} class handles retries and scheduling of auxiliary tasks (for example, pruning expired messages, see {{#scheduleMaintenanceWhileDisconnected}}). OutboundConnection handles capacity limits (see {{#enqueue}} and {{#acquireCapacity}}) and puts messages to {{OutboundMessageQueue}} if resource limits allow. Two subclasses of Delivery ({{EventLoop|LargeMessageDelivery}}) are responsible for polling the queue (see {{#doRun}}). {{LargeMessageDelivery}} uses {{AsyncMessagingOutputPlus}}, which buffers data, encodes it to frames and asynchronously flushes them when buffer fills up. Both paths have configurations, located in {{Inbound|OutboundConnectionSettings}}. There is already some documentation in place in javadoc and by the time the patch is committed all important pieces will be documented. If you see / think documentation is missing for some piece of functionality, share your thoughts. > Improvements to Internode Messaging > --- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode >Reporter: Benedict >Assignee: Benedict >Priority: High > Fix For: 4.0 > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are th
[jira] [Commented] (CASSANDRA-10190) Python 3 support for cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826742#comment-16826742 ] Dinesh Joshi commented on CASSANDRA-10190: -- [~ptbannister] I found a few issues and fixed them. Here's a branch with the changes and rebased on the current trunk - https://github.com/dineshjoshi/cassandra/tree/10190-review > Python 3 support for cqlsh > -- > > Key: CASSANDRA-10190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10190 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: Andrew Pennebaker >Assignee: Patrick Bannister >Priority: Normal > Labels: cqlsh > Attachments: coverage_notes.txt > > > Users who operate in a Python 3 environment may have trouble launching cqlsh. > Could we please update cqlsh's syntax to run in Python 3? > As a workaround, users can setup pyenv, and cd to a directory with a > .python-version containing "2.7". But it would be nice if cqlsh supported > modern Python versions out of the box. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15100) Improve no-op cleanup performance
[ https://issues.apache.org/jira/browse/CASSANDRA-15100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15100: Test and Documentation Plan: adds a unit test to make sure we skip sstables correctly Status: Patch Available (was: Open) Patch to filter the sstables earlier ||Branch||Tests|| |[trunk|https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15100-trunk]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-trunk]| |[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...krummas:marcuse/15100-3.11]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-3.11]| |[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...krummas:marcuse/15100-3.0]|[cci|https://circleci.com/gh/krummas/workflows/cassandra/tree/marcuse%2F15100-3.0]| > Improve no-op cleanup performance > - > > Key: CASSANDRA-15100 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15100 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Low > Fix For: 3.0.x, 3.11.x, 4.x > > > We should filter sstables in `OneSSTableOperation#filterSSTables` instead of > in the cleanup method to avoid creating unnecessary single-sstable > transactions for sstables fully contained in the owned ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org