[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091240#comment-17091240 ] Joey Lynch commented on CASSANDRA-15379: Final commit with some quick fixes to the docs to make them a little clearer, test runs linked below. ||trunk|| |[063811c44|https://github.com/jolynch/cassandra/commit/063811c44f41996ee4903c92a95aa108e7ff7ad4]| |[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379-final]| |[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final]| All unit tests and in-jvm dtests passed, a few dtest flakes on java8 and java11 that I'm pretty sure are unrelated (a transient replication dtest and two nodetool dtests). * test_refresh_size_estimates_clears_invalid_entries - nodetool_test.TestNodetool * test_optimized_primary_range_repair - transient_replication_test.TestTransientReplication * test_repaired_tracking_with_mismatching_replicas - repair_tests.incremental_repair_test.TestIncRepair All appear to be unrelated failures. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > Attachments: 15379_backfill_drops_zstd_level10.png, > 15379_backfill_duration_zstd_level10.png, > 15379_backfill_queueing_zstd_level10.png, 15379_backfill_zstd_level10.png, > 15379_baseline_flush_trace.png, 15379_candidate_flush_trace.png, > 15379_concurrent_flushes_zstd_level10.png, 15379_coordinator_defaults.png, > 15379_coordinator_zstd_defaults.png, 15379_coordinator_zstd_level10.png, > 15379_flush_flamegraph_zstd_level10.png, > 15379_message_drops_zstd_level10.png, 15379_replica_defaults.png, > 15379_replica_zstd_defaults.png, 15379_request_queueing_zstd_level10.png, > 15379_system_defaults.png, 15379_system_zstd_defaults.png > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089243#comment-17089243 ] Joey Lynch commented on CASSANDRA-15379: *Zstd Write Mostly Read Rarely Benchmark*: In this test I configured Zstd the way we do in production for our write mostly read rarely (e.g. trace) datasets where Zstd really shines at getting the footprint down significantly (up to 50% in some cases). This benchmark simulates our production workloads for Zstd most accurately so far. * Load pattern: 3.6K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: ~50 million partitions with 2 rows each of 10 columns, total size per partition of about 4 KiB of random data. ~300 GiB per node data size (replicated 6 ways) * Compaction settings: STCS with min=8, max=32 * Compression: Zstd level 10 with 256 KiB block size {noformat} compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '8'} compression = {'chunk_length_in_kb': '256', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': '10'} {noformat} *Zstd Write Mostly Read Rarely Benchmark Results*: The candidate branch did significantly better in all aspects. Most importantly, the baseline cluster started falling infinitely behind and queueing/dropping mutations while candidate deferred the expensive work to compaction. Flamegraphs confirmed that the vast majority of our flusher thread on-cpu time was spent in zstd compression. Some data to support this conclusion: [^15379_request_queueing_zstd_level10.png] [^15379_message_drops_zstd_level10.png] [^15379_coordinator_zstd_level10.png] [^15379_flush_flamegraph_zstd_level10.png] [^15379_concurrent_flushes_zstd_level10.png] [^15379_backfill_duration_zstd_level10.png] [^15379_backfill_drops_zstd_level10.png] [^15379_backfill_queueing_zstd_level10.png] [^15379_backfill_zstd_level10.png] This data clearly shows that baseline using zstd on the flush was so slow at flushing that it was unstable, like we observed in production at Netflix. The candidate version that flushed the data in LZ4 and then amortized the expensive compression to the compaction instead fared significantly better and remained relatively stable. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > Attachments: 15379_backfill_drops_zstd_level10.png, > 15379_backfill_duration_zstd_level10.png, > 15379_backfill_queueing_zstd_level10.png, 15379_backfill_zstd_level10.png, > 15379_baseline_flush_trace.png, 15379_candidate_flush_trace.png, > 15379_concurrent_flushes_zstd_level10.png, 15379_coordinator_defaults.png, > 15379_coordinator_zstd_defaults.png, 15379_coordinator_zstd_level10.png, > 15379_flush_flamegraph_zstd_level10.png, > 15379_message_drops_zstd_level10.png, 15379_replica_defaults.png, > 15379_replica_zstd_defaults.png, 15379_request_queueing_zstd_level10.png, > 15379_system_defaults.png, 15379_system_zstd_defaults.png > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: >
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088266#comment-17088266 ] Joey Lynch commented on CASSANDRA-15379: *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: ~100 million partitions with 2 rows each of 10 columns, total size per partition of about 4 KiB of random data. ~120 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=320MiB, fanout=20 * Compression: Zstd with 16 KiB block size I had to tweak some settings to make compaction less of the overall trace (it was 50+% or more of the traces) which are hiding the flush behavior. Specifically I increased the size of the memtable before flush by increasing the {{memtable_cleanup_threshold}} setting from 0.11 to 0.5, which allowed flushes to get up to 1.4 GiB, and by setting compaction to defer as long as we can before doing the L0 -> L1 transition: {noformat} compaction = {'class': 'LeveledCompactionStrategy', 'fanout_size': '20', 'max_threshold': '128', 'min_threshold': '32', 'sstable_size_in_mb': '320'} compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor'} {noformat} I would prefer to up fanout_size even more to defer compactions further, but with the increase in memtable size and increase in sstable size and fanout I was able to reduce the compaction load to where the cluster was stable (pending compactions not growing without bound) on both baseline and candidate *Zstd Defaults Benchmark Results*: Candidate flushes were spaced about 4 minutes apart and took about 8 seconds to flush 1.4 GiB. Flamegraphs show 50% of on-cpu time in flush writer and ~45 in compression. [^15379_candidate_flush_trace.png] Baseline flushes were spaced about 4 minutes apart and took about 22 seconds to flush 1.4 GiB. Flamegraphs show 20% of on-cpu time in flush writer and ~75 in compression. [^15379_baseline_flush_trace.png] No significant change in coordinator level, replica level latency or system metrics. Some latencies were better on candidate some worse. [^15379_system_zstd_defaults.png] [^15379_coordinator_zstd_defaults.png] [^15379_replica_zstd_defaults.png] I think the main finding here is that already, with the cheapest zstd level, we are running closer to the flush interval than I'd like (if it takes longer to flush then the next time we flush, it's bad news bears for the cluster), and this is with a relatively small number of writes per second (~400 coordinator writes per second per node) *Next steps:* I've published a final squashed commit to: ||trunk|| |[657c39d4|https://github.com/jolynch/cassandra/commit/657c39d4aba0888c6db6a46d1b1febf899de9578]| |[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379-final]| |[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final]| There appear to be a lot of failures in java8 runs that I'm pretty sure are unrelated to my change (unit tests and in-jvm dtests passed, along with long unit tests). I'll look into all the failures and make sure they're unrelated (on a related note I'm :( that trunk is so red again). I am now running a test with Zstd compression set to a block size of 256 KiB and level 10, which is how we typically run it in production for write mosty read rarely datasets such as trace data (for the significant reduction in disk space). > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > Attachments: 15379_baseline_flush_trace.png, > 15379_candidate_flush_trace.png, 15379_coordinator_defaults.png, > 15379_coordinator_zstd_defaults.png, 15379_replica_defaults.png, > 15379_replica_zstd_defaults.png, 15379_system_defaults.png, > 15379_system_zstd_defaults.png > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087187#comment-17087187 ] Joey Lynch commented on CASSANDRA-15379: Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block siz *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for commit * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to see reduced CPU usage due to flushes. I will likely have to reduce the read/write throughput due to compactions taking a crazy amount of our on CPU time with this configuration. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > Attachments: 15379_coordinator_defaults.png, > 15379_replica_defaults.png, 15379_system_defaults.png > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064428#comment-17064428 ] Benedict Elliott Smith commented on CASSANDRA-15379: There are definitely situations where we might care, but compression is perhaps the archetypal "doesn't matter much" scenario: we are dispatching large costly operations, so if we're a few instructions slower in the process, it should be a rounding error. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064330#comment-17064330 ] Josh McKenzie commented on CASSANDRA-15379: --- {quote}This is mainly due to Java JIT's inability to optimize megamorphic call sites. However, I think this is just a theory and we should try and validate it using an actual performance test. {quote} Going off what [Shipilev|https://shipilev.net/jvm/anatomy-quarks/16-megamorphic-virtual-calls/] has to say [on the topic|https://shipilev.net/blog/2015/black-magic-method-dispatch/#_conclusion], seems like something we probably shouldn't lose too much sleep over, and definitely would want to benchmark if we were concerned. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064118#comment-17064118 ] Dinesh Joshi commented on CASSANDRA-15379: -- My main concern is the addition of the Noop compressor. So Noop vs No Compressor would be the minimal test case. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059531#comment-17059531 ] Joey Lynch commented on CASSANDRA-15379: Cool, took your changes and [rebased on trunk with a few fixups|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]. Tests are running now. I am having some trouble with our performance integration suite for trunk right now, but should hopefully be able to run those performance tests on Monday. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971010#comment-16971010 ] Joey Lynch commented on CASSANDRA-15379: [~djoshi] per your feedback in slack I've added the ability for the user to control the flush via a yaml option while doing the right thing by default. In order to implement the "don't compress during the flush" [option you suggested|https://the-asf.slack.com/archives/CK23JSY2K/p1572905922120300?thread_ts=1572905763.117000=CK23JSY2K] I figured that the easiest was was to just implement the simple [NoopCompressor|https://github.com/apache/cassandra/commit/9030d8abcf593c06e85f549947ad41621d4776d1] everyone has been mentioning for years. I was having a hard time turning off compression at the level of abstraction BigTableWriter operates at since it doesn't control that e.g. the compression offset file get's written. This way even if you select "none" your flush is still protected by block level checksums. Separately it gives us a good path forward for mitigating CASSANDRA-12682 and CASSANDRA-9264 if we want it to I think. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966938#comment-16966938 ] Joey Lynch commented on CASSANDRA-15379: My rationale for the {{EnumSet}} over a boolean member function is: # Versus the boolean function idea it doesn't break the ICompressor abstraction and let compressors know that flushes exist. As in, it is very easy for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but probably can't make the call if that should be used in flushes or other situations. I could have a {{isFastCompressor}} boolean function but given that {{ICompressor}} is a public API interface I think sets of capabilities will be more maintainable than a collection of boolean functions going forwards, especially if we start adding more capabilities (see #2). # If we go down the path of _not_ making more knobs and just try to have the database figure out the best way to compress data for users this is easier to maintain long term since compressors can offer multiple types of hints to the database. For example the database might refuse to use slow compressors in flushes, commitlogs, etc or having compaction strategies opt into higher ratio compression strategies in higher "levels". If we do go down this path there are fewer interface changes (instead of adding and removing functions we just add ICompressor.Uses hints). # Versus the set of strings idea, it has compile time checks that are useful (which is the primary argument against sets of strings afaik). After thinking about this problem space more I'm no longer convinced that giving general users more knobs here is the right choice (the table properties). By using a {{suitableUses}} hint the database can internally optimize: * Flushes: "get this data off my heap as fast as possible". We don't care about ratio (since the products will be re-compacted shortly) or decompression speed, only care about compression speed. * Commitlog: "some compression is nice but get this data off my heap fast". We mostly care about compression speed, but very minorly about ratio. * Compaction: "The older the data the more compressed it should be". We care a lot about decompression speed and ratio, but don't want to pick expensive compressors at the high churn points (L0 in LCS, small tables in STCS, before the time window bucket in TWCS) The interface still gives advanced users a backdoor (they extend the compressor they want to change the behavior of and change what capabilities it offers). > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966637#comment-16966637 ] Benedict Elliott Smith commented on CASSANDRA-15379: What is your rationale for an {{EnumSet}} being more maintainable than a member function? As far as I understand we explicitly intend to retire this functionality, so planning for future uses seems counterproductive to me. If we're adding per-table config for this, why are we blanket changing the behaviour for all relevant compressors? This may well be surprising to users, and also seems to make the per-table config superfluous (or at least, only useful to restore the probably-assumed behaviour of using the same compressor for both flush and compaction) > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966431#comment-16966431 ] Joey Lynch commented on CASSANDRA-15379: Alright, I made it so that Zstd, Deflate and LZ4HC (which compresses extremely slowly) now flush with LZ4 (fast) controlled via an EnumSet. Since I'm changing the ICompressor interface I figured it is more maintainable this way than having a somewhat arbitrary boolean switch. I also took the opportunity to add some more tests and improve the documentation as well. I tried to add some helpful documentation to help people pick compressors (I hear a lot of confusion about why we have Snappy and Deflate still around, so I tried to clarify in the documentation). I'll squash after review comments are integrated. ||trunk|| |[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]| |[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379]| The one failing unit test appears to be org.apache.cassandra.config.DatabaseDescriptorRefTest, which I thought was supposed to be fixed as part of CASSANDRA-15371, I'll double check tomorrow. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964742#comment-16964742 ] Benedict Elliott Smith commented on CASSANDRA-15379: In principle this seems reasonable to me, though since this is a temporary measure, and anyway because {{String}} are a bad way to communicate intent (particularly with a specific concept like "unsuitability"), perhaps just something like: {code} default boolean useOnMemtableFlush() { return true; } {code} Or alternatively {code} default ICompressor useOnMemtableFlush() { return this; } {code} > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964385#comment-16964385 ] Dinesh Joshi commented on CASSANDRA-15379: -- I am +1 on this idea. [~jolynch] would be happy to help review this. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962317#comment-16962317 ] Joey Lynch commented on CASSANDRA-15379: Yeah I understand it's a bit late in the 4.0 cut, but I think we'll have users running into this trying out the new Zstd compressor rather quickly (we ran into it on the first cluster we dropped it on). If you're not a fan of the defaults in yaml (I agree it's not great) to reduce scope I could keep the change internal to C* entirely by adding a default method on {{ICompressor}} such as: {noformat} default Set unsuitableUseHints() { return Collections.emptySet(); }{noformat} Then the ZstdCompressor would yield a set with the string "flush" or something and the flush code path would just use the default compressor in that case. [~benedict] what do you think about this alternative? > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961907#comment-16961907 ] Benedict Elliott Smith commented on CASSANDRA-15379: Makes sense. We keep adding things to 4.0, but this seems to me to bundle along with other config cleanups, and is super minor. I'm not certain about the idea of putting default parameters in the yaml, though, as this is a feature we'll have to maintain despite making very little sense. We've talked about introducing global and per-Keyspace defaults for tables, and I wonder if we should depend on that here. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org