[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-04-24 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091240#comment-17091240
 ] 

Joey Lynch commented on CASSANDRA-15379:


Final commit with some quick fixes to the docs to make them a little clearer, 
test runs linked below.

||trunk||
|[063811c44|https://github.com/jolynch/cassandra/commit/063811c44f41996ee4903c92a95aa108e7ff7ad4]|
|[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379-final]|
|[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final.png?circle-token=
 
1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final]|

All unit tests and in-jvm dtests passed, a few dtest flakes on java8 and java11 
that I'm pretty sure are unrelated (a transient replication dtest and two 
nodetool dtests).
* test_refresh_size_estimates_clears_invalid_entries - 
nodetool_test.TestNodetool
* test_optimized_primary_range_repair - 
transient_replication_test.TestTransientReplication
* test_repaired_tracking_with_mismatching_replicas - 
repair_tests.incremental_repair_test.TestIncRepair

All appear to be unrelated failures.



> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: 15379_backfill_drops_zstd_level10.png, 
> 15379_backfill_duration_zstd_level10.png, 
> 15379_backfill_queueing_zstd_level10.png, 15379_backfill_zstd_level10.png, 
> 15379_baseline_flush_trace.png, 15379_candidate_flush_trace.png, 
> 15379_concurrent_flushes_zstd_level10.png, 15379_coordinator_defaults.png, 
> 15379_coordinator_zstd_defaults.png, 15379_coordinator_zstd_level10.png, 
> 15379_flush_flamegraph_zstd_level10.png, 
> 15379_message_drops_zstd_level10.png, 15379_replica_defaults.png, 
> 15379_replica_zstd_defaults.png, 15379_request_queueing_zstd_level10.png, 
> 15379_system_defaults.png, 15379_system_zstd_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-04-21 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089243#comment-17089243
 ] 

Joey Lynch commented on CASSANDRA-15379:


*Zstd Write Mostly Read Rarely Benchmark*:

In this test I configured Zstd the way we do in production for our write mostly 
read rarely (e.g. trace) datasets where Zstd really shines at getting the 
footprint down significantly (up to 50% in some cases). This benchmark 
simulates our production workloads for Zstd most accurately so far.
 * Load pattern: 3.6K wps and 1.2k rps at LOCAL_ONE consistency with a  random 
load pattern.
 * Data sizing: ~50 million partitions with 2 rows each of 10 columns, total 
size per partition of about 4 KiB of random data. ~300 GiB per node data size 
(replicated 6 ways)
 * Compaction settings: STCS with min=8, max=32
 * Compression: Zstd level 10 with 256 KiB block size

{noformat}
compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '8'}
compression = {'chunk_length_in_kb': '256', 'class': 
'org.apache.cassandra.io.compress.ZstdCompressor', 'compression_level': '10'}
{noformat}
*Zstd Write Mostly Read Rarely Benchmark Results*:

The candidate branch did significantly better in all aspects. Most importantly, 
the baseline cluster started falling infinitely behind and queueing/dropping 
mutations while candidate deferred the expensive work to compaction. 
Flamegraphs confirmed that the vast majority of our flusher thread on-cpu time 
was spent in zstd compression. Some data to support this conclusion:
 [^15379_request_queueing_zstd_level10.png]
 [^15379_message_drops_zstd_level10.png]
 [^15379_coordinator_zstd_level10.png]
 [^15379_flush_flamegraph_zstd_level10.png]
 [^15379_concurrent_flushes_zstd_level10.png]
 [^15379_backfill_duration_zstd_level10.png]
 [^15379_backfill_drops_zstd_level10.png]
 [^15379_backfill_queueing_zstd_level10.png]
 [^15379_backfill_zstd_level10.png]

This data clearly shows that baseline using zstd on the flush was so slow at 
flushing that it was unstable, like we observed in production at Netflix. The 
candidate version that flushed the data in LZ4 and then amortized the expensive 
compression to the compaction instead fared significantly better and remained 
relatively stable.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: 15379_backfill_drops_zstd_level10.png, 
> 15379_backfill_duration_zstd_level10.png, 
> 15379_backfill_queueing_zstd_level10.png, 15379_backfill_zstd_level10.png, 
> 15379_baseline_flush_trace.png, 15379_candidate_flush_trace.png, 
> 15379_concurrent_flushes_zstd_level10.png, 15379_coordinator_defaults.png, 
> 15379_coordinator_zstd_defaults.png, 15379_coordinator_zstd_level10.png, 
> 15379_flush_flamegraph_zstd_level10.png, 
> 15379_message_drops_zstd_level10.png, 15379_replica_defaults.png, 
> 15379_replica_zstd_defaults.png, 15379_request_queueing_zstd_level10.png, 
> 15379_system_defaults.png, 15379_system_zstd_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
>   

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-04-20 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088266#comment-17088266
 ] 

Joey Lynch commented on CASSANDRA-15379:


*Defaults Benchmark:*
 * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a  random 
load pattern.
 * Data sizing: ~100 million partitions with 2 rows each of 10 columns, total 
size per partition of about 4 KiB of random data. ~120 GiB per node data size 
(replicated 6 ways)
 * Compaction settings: LCS with size=320MiB, fanout=20
 * Compression: Zstd with 16 KiB block size

I had to tweak some settings to make compaction less of the overall trace (it 
was 50+% or more of the traces) which are hiding the flush behavior. 
Specifically I increased the size of the memtable before flush by increasing 
the {{memtable_cleanup_threshold}} setting from 0.11 to 0.5, which allowed 
flushes to get up to 1.4 GiB, and by setting compaction to defer as long as we 
can before doing the L0 -> L1 transition:
{noformat}
compaction = {'class': 'LeveledCompactionStrategy', 'fanout_size': '20', 
'max_threshold': '128', 'min_threshold': '32', 'sstable_size_in_mb': '320'}
compression = {'chunk_length_in_kb': '16', 'class': 
'org.apache.cassandra.io.compress.ZstdCompressor'}
{noformat}
I would prefer to up fanout_size even more to defer compactions further, but 
with the increase in memtable size and increase in sstable size and fanout I 
was able to reduce the compaction load to where the cluster was stable (pending 
compactions not growing without bound) on both baseline and candidate 

*Zstd Defaults Benchmark Results*:

Candidate flushes were spaced about 4 minutes apart and took about 8 seconds to 
flush 1.4 GiB. Flamegraphs show 50% of on-cpu time in flush writer and ~45 in 
compression. [^15379_candidate_flush_trace.png]

Baseline flushes were spaced about 4 minutes apart and took about 22 seconds to 
flush 1.4 GiB. Flamegraphs show 20% of on-cpu time in flush writer and ~75 in 
compression.  [^15379_baseline_flush_trace.png]

No significant change in coordinator level, replica level latency or system 
metrics. Some latencies were better on candidate some worse. 
[^15379_system_zstd_defaults.png] [^15379_coordinator_zstd_defaults.png] 
[^15379_replica_zstd_defaults.png]

I think the main finding here is that already, with the cheapest zstd level, we 
are running closer to the flush interval than I'd like (if it takes longer to 
flush then the next time we flush, it's bad news bears for the cluster), and 
this is with a relatively small number of writes per second (~400 coordinator 
writes per second per node)

*Next steps:*

I've published a final squashed commit to:
||trunk||
|[657c39d4|https://github.com/jolynch/cassandra/commit/657c39d4aba0888c6db6a46d1b1febf899de9578]|
|[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379-final]|
|[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final.png?circle-token=
 
1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final]|

There appear to be a lot of failures in java8 runs that I'm pretty sure are 
unrelated to my change (unit tests and in-jvm dtests passed, along with long 
unit tests). I'll look into all the failures and make sure they're unrelated 
(on a related note I'm :( that trunk is so red again).

I am now running a test with Zstd compression set to a block size of 256 KiB 
and level 10, which is how we typically run it in production for write mosty 
read rarely datasets such as trace data (for the significant reduction in disk 
space). 

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: 15379_baseline_flush_trace.png, 
> 15379_candidate_flush_trace.png, 15379_coordinator_defaults.png, 
> 15379_coordinator_zstd_defaults.png, 15379_replica_defaults.png, 
> 15379_replica_zstd_defaults.png, 15379_system_defaults.png, 
> 15379_system_zstd_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are 

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-04-19 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087187#comment-17087187
 ] 

Joey Lynch commented on CASSANDRA-15379:


Alright, finally fixed our internal trunk build so we can do performance 
validations again. I ran the following performance benchmark and the results 
are essentially identical for the default configuration (so testing _just_ the 
addition of the NoopCompressor on the megamorphic call sites).

*Experimental Setup:*

A baseline and candidate cluster of EC2 machines running the following:
 * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge
 * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a 
consistent load against the cluster
 * Baseline C* version: Latest trunk (b05fe7ab)
 * Candidate C* version: The proposed patch applied to the same version of trunk
 * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber 
io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs 
pfifo_fast)

In all cases load is applied and then we wait for metrics to settle, especially 
things like pending compactions, read/write latencies, p99 latencies, etc ...

*Defaults Benchmark:*
 * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a  random 
load pattern.
 * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB 
of random data. ~100 GiB per node data size (replicated 6 ways)
 * Compaction settings: LCS with size=256MiB, fanout=20
 * Compression: LZ4 with 16 KiB block siz 

*Defaults Benchmark Results:*

We do not have data to support the hypothesis that the megamorphic call sites 
have become more expensive to the addition of the NoopCompressor.

1. No significant change at the coordinator level (least relevant metric): 
[^15379_coordinator_defaults.png]
2. No significant change at the replica level (most relevant metric): 
[^15379_replica_defaults.png]
3. No significant change at the system resource level (second most relevant 
metrics): [^15379_system_defaults.png]

Our external flamegraphs exports appear to be broken, but I looked at them and 
they also show no noticeable difference (I'll work with our performance team to 
fix exports so I can share the data here).

*Next steps for me:*
 * Squash, rebase, and re-run unit and dtests with latest trunk in preparation 
for commit
 * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to 
see reduced CPU usage due to flushes. I will likely have to reduce the 
read/write throughput due to compactions taking a crazy amount of our on CPU 
time with this configuration.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: 15379_coordinator_defaults.png, 
> 15379_replica_defaults.png, 15379_system_defaults.png
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect 

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-03-22 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064428#comment-17064428
 ] 

Benedict Elliott Smith commented on CASSANDRA-15379:


There are definitely situations where we might care, but compression is perhaps 
the archetypal "doesn't matter much" scenario: we are dispatching large costly 
operations, so if we're a few instructions slower in the process, it should be 
a rounding error.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-03-22 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064330#comment-17064330
 ] 

Josh McKenzie commented on CASSANDRA-15379:
---

{quote}This is mainly due to Java JIT's inability to optimize megamorphic call 
sites. However, I think this is just a theory and we should try and validate it 
using an actual performance test.
{quote}
Going off what 
[Shipilev|https://shipilev.net/jvm/anatomy-quarks/16-megamorphic-virtual-calls/]
 has to say [on the 
topic|https://shipilev.net/blog/2015/black-magic-method-dispatch/#_conclusion], 
seems like something we probably shouldn't lose too much sleep over, and 
definitely would want to benchmark if we were concerned.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-03-21 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064118#comment-17064118
 ] 

Dinesh Joshi commented on CASSANDRA-15379:
--

My main concern is the addition of the Noop compressor. So Noop vs No 
Compressor would be the minimal test case.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2020-03-14 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059531#comment-17059531
 ] 

Joey Lynch commented on CASSANDRA-15379:


Cool, took your changes and [rebased on trunk with a few 
fixups|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379].
 Tests are running now.

I am having some trouble with our performance integration suite for trunk right 
now, but should hopefully be able to run those performance tests on Monday.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-09 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971010#comment-16971010
 ] 

Joey Lynch commented on CASSANDRA-15379:


[~djoshi] per your feedback in slack I've added the ability for the user to 
control the flush via a yaml option while doing the right thing by default.

In order to implement the "don't compress during the flush" [option you 
suggested|https://the-asf.slack.com/archives/CK23JSY2K/p1572905922120300?thread_ts=1572905763.117000=CK23JSY2K]
 I figured that the easiest was was to just implement the simple 
[NoopCompressor|https://github.com/apache/cassandra/commit/9030d8abcf593c06e85f549947ad41621d4776d1]
 everyone has been mentioning for years. I was having a hard time turning off 
compression at the level of abstraction BigTableWriter operates at since it 
doesn't control that e.g. the compression offset file get's written. This way 
even if you select "none" your flush is still protected by block level 
checksums. Separately it gives us a good path forward for mitigating 
CASSANDRA-12682 and CASSANDRA-9264 if we want it to I think.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966938#comment-16966938
 ] 

Joey Lynch commented on CASSANDRA-15379:


My rationale for the {{EnumSet}} over a boolean member function is:
 # Versus the boolean function idea it doesn't break the ICompressor 
abstraction and let compressors know that flushes exist. As in, it is very easy 
for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but 
probably can't make the call if that should be used in flushes or other 
situations. I could have a {{isFastCompressor}} boolean function but given that 
{{ICompressor}} is a public API interface I think sets of capabilities will be 
more maintainable than a collection of boolean functions going forwards, 
especially if we start adding more capabilities (see #2).
 # If we go down the path of _not_ making more knobs and just try to have the 
database figure out the best way to compress data for users this is easier to 
maintain long term since compressors can offer multiple types of hints to the 
database. For example the database might refuse to use slow compressors in 
flushes, commitlogs, etc or having compaction strategies opt into higher ratio 
compression strategies in higher "levels". If we do go down this path there are 
fewer interface changes (instead of adding and removing functions we just add 
ICompressor.Uses hints).
 # Versus the set of strings idea, it has compile time checks that are useful 
(which is the primary argument against sets of strings afaik).

After thinking about this problem space more I'm no longer convinced that 
giving general users more knobs here is the right choice (the table 
properties). By using a {{suitableUses}} hint the database can internally 
optimize:
 * Flushes: "get this data off my heap as fast as possible". We don't care 
about ratio (since the products will be re-compacted shortly) or decompression 
speed, only care about compression speed.
 * Commitlog: "some compression is nice but get this data off my heap fast". We 
mostly care about compression speed, but very minorly about ratio.
 * Compaction: "The older the data the more compressed it should be". We care a 
lot about decompression speed and ratio, but don't want to pick expensive 
compressors at the high churn points (L0 in LCS, small tables in STCS, before 
the time window bucket in TWCS)

The interface still gives advanced users a backdoor (they extend the compressor 
they want to change the behavior of and change what capabilities it offers).

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed 

[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-04 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966637#comment-16966637
 ] 

Benedict Elliott Smith commented on CASSANDRA-15379:


What is your rationale for an {{EnumSet}} being more maintainable than a member 
function?  As far as I understand we explicitly intend to retire this 
functionality, so planning for future uses seems counterproductive to me.

If we're adding per-table config for this, why are we blanket changing the 
behaviour for all relevant compressors?  This may well be surprising to users, 
and also seems to make the per-table config superfluous (or at least, only 
useful to restore the probably-assumed behaviour of using the same compressor 
for both flush and compaction)

 

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-03 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966431#comment-16966431
 ] 

Joey Lynch commented on CASSANDRA-15379:


Alright, I made it so that Zstd, Deflate and LZ4HC (which compresses extremely 
slowly) now flush with LZ4 (fast) controlled via an EnumSet. Since I'm changing 
the ICompressor interface I figured it is more maintainable this way than 
having a somewhat arbitrary boolean switch.

I also took the opportunity to add some more tests and improve the 
documentation as well. I tried to add some helpful documentation to help people 
pick compressors (I hear a lot of confusion about why we have Snappy and 
Deflate still around, so I tried to clarify in the documentation). I'll squash 
after review comments are integrated.

||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]|
|[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379.png?circle-token=
 
1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379]|

The one failing unit test appears to be 
org.apache.cassandra.config.DatabaseDescriptorRefTest, which I thought was 
supposed to be fixed as part of CASSANDRA-15371, I'll double check tomorrow.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-11-01 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964742#comment-16964742
 ] 

Benedict Elliott Smith commented on CASSANDRA-15379:


In principle this seems reasonable to me, though since this is a temporary 
measure, and anyway because {{String}} are a bad way to communicate intent 
(particularly with a specific concept like "unsuitability"), perhaps just 
something like:

{code}
default boolean useOnMemtableFlush() { return true; }
{code}

Or alternatively
{code}
default ICompressor useOnMemtableFlush() { return this; }
{code}

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-10-31 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964385#comment-16964385
 ] 

Dinesh Joshi commented on CASSANDRA-15379:
--

I am +1 on this idea. [~jolynch] would be happy to help review this.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-10-29 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962317#comment-16962317
 ] 

Joey Lynch commented on CASSANDRA-15379:


Yeah I understand it's a bit late in the 4.0 cut, but I think we'll have users 
running into this trying out the new Zstd compressor rather quickly (we ran 
into it on the first cluster we dropped it on).

If you're not a fan of the defaults in yaml (I agree it's not great) to reduce 
scope I could keep the change internal to C* entirely by adding a default 
method on {{ICompressor}} such as:
{noformat}
default Set unsuitableUseHints() {
  return Collections.emptySet();
}{noformat}
Then the ZstdCompressor would yield a set with the string "flush" or something 
and the flush code path would just use the default compressor in that case.

[~benedict] what do you think about this alternative?

 

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with

2019-10-29 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961907#comment-16961907
 ] 

Benedict Elliott Smith commented on CASSANDRA-15379:


Makes sense.  We keep adding things to 4.0, but this seems to me to bundle 
along with other config cleanups, and is super minor.

I'm not certain about the idea of putting default parameters in the yaml, 
though, as this is a feature we'll have to maintain despite making very little 
sense.  We've talked about introducing global and per-Keyspace defaults for 
tables, and I wonder if we should depend on that here.

> Make it possible to flush with a different compression strategy than we 
> compact with
> 
>
> Key: CASSANDRA-15379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15379
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction, Local/Config, Local/Memtable
>Reporter: Joey Lynch
>Assignee: Joey Lynch
>Priority: Normal
>
> [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on 
> some of our most dense clusters and have been observing close to 50% 
> reduction in footprint with Zstd on some of our workloads! Unfortunately 
> though we have been running into an issue where the flush might take so long 
> (Zstd is slower to compress than LZ4) that we can actually block the next 
> flush and cause instability.
> Internally we are working around this with a very simple patch which flushes 
> SSTables as the default compression strategy (LZ4) regardless of the table 
> params. This is a simple solution but I think the ideal solution though might 
> be for the flush compression strategy to be configurable separately from the 
> table compression strategy (while defaulting to the same thing). Instead of 
> adding yet another compression option to the yaml (like hints and commitlog) 
> I was thinking of just adding it to the table parameters and then adding a 
> {{default_table_parameters}} yaml option like:
> {noformat}
> # Default table properties to apply on freshly created tables. The currently 
> supported defaults are:
> # * compression   : How are SSTables compressed in general (flush, 
> compaction, etc ...)
> # * flush_compression : How are SSTables compressed as they flush
> # supported
> default_table_parameters:
>   compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 16
>   flush_compression:
> class_name: 'LZ4Compressor'
> parameters:
>   chunk_length_in_kb: 4
> {noformat}
> This would have the nice effect as well of giving our configuration a path 
> forward to providing user specified defaults for table creation (so e.g. if a 
> particular user wanted to use a different default chunk_length_in_kb they can 
> do that).
> So the proposed (~mandatory) scope is:
> * Flush with a faster compression strategy
> I'd like to implement the following at the same time:
> * Per table flush compression configuration
> * Ability to default the table flush and compaction compression in the yaml.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org