[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850195#comment-17850195 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/28/24 10:12 PM: - I've hardened the path little bit and added few tests. [CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed] {noformat} java17_pre-commit_tests ✓ j17_build 4m 3s ✓ j17_cqlsh_dtests_py311 7m 13s ✓ j17_cqlsh_dtests_py311_vnode 7m 37s ✓ j17_cqlsh_dtests_py386m 52s ✓ j17_cqlsh_dtests_py38_vnode 7m 21s ✓ j17_cqlshlib_cython_tests7m 56s ✓ j17_cqlshlib_tests 6m 46s ✓ j17_jvm_dtests_latest_vnode 27m 54s ✓ j17_unit_tests 14m 44s ✓ j17_utests_latest15m 3s ✕ j17_dtests 37m 42s scrub_test.TestScrub test_standalone_scrub_essential_files_only topology_test.TestTopology test_movement ✕ j17_dtests_latest 35m 24s offline_tools_test.TestOfflineTools test_sstableverify scrub_test.TestScrub test_standalone_scrub_essential_files_only ✕ j17_dtests_vnode36m 15s scrub_test.TestScrub test_standalone_scrub_essential_files_only ✕ j17_jvm_dtests 29m 10s org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest testOptionalMtlsModeDoNotAllowNonSSLConnections org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest testEndpointVerificationEnabledIpNotInSAN ✕ j17_utests_oa17m 8s org.apache.cassandra.db.compaction.CompactionStrategyManagerTest testAutomaticUpgradeConcurrency java17_separate_tests java11_pre-commit_tests java11_separate_tests {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/9fbc3590-1168-41f8-a7c8-a3fbb3dfc0b0] [java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/d2e65942-b99e-4927-bd65-85800e9d94e9] [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/df51197e-c92e-454f-9c75-2f5eaee43bb8] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/6df36e29-d2cd-4838-b3b8-69e9113b295f] was (Author: smiklosovic): [CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed] {noformat} java17_pre-commit_tests ✓ j17_build 4m 3s ✓ j17_cqlsh_dtests_py311 7m 13s ✓ j17_cqlsh_dtests_py311_vnode 7m 37s ✓ j17_cqlsh_dtests_py386m 52s ✓ j17_cqlsh_dtests_py38_vnode 7m 21s ✓ j17_cqlshlib_cython_tests7m 56s ✓ j17_cqlshlib_tests 6m 46s ✓ j17_jvm_dtests_latest_vnode 27m 54s ✓ j17_unit_tests 14m 44s ✓ j17_utests_latest15m 3s ✕ j17_dtests 37m 42s scrub_test.TestScrub test_standalone_scrub_essential_files_only topology_test.TestTopology test_movement ✕ j17_dtests_latest 35m 24s offline_tools_test.TestOfflineTools test_sstableverify scrub_test.TestScrub test_standalone_scrub_essential_files_only ✕ j17_dtests_vnode36m 15s scrub_test.TestScrub test_standalone_scrub_essential_files_only ✕ j17_jvm_dtests 29m 10s org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest testOptionalMtlsModeDoNotAllowNonSSLConnections org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest testEndpointVerificationEnabledIpNotInSAN ✕ j17_utests_oa17m 8s org.apache.cassandra.db.compaction.CompactionStrategyManagerTest testAutomaticUpgradeConcurrency java17_separate_tests java11_pre-commit_tests java11_separate_tests {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/9fbc3590-1168-41f8-a7c8-a3fbb3dfc0b0] [java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassand
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849218#comment-17849218 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/24/24 8:36 AM: Seems reasonably clean ... What I have not done is that the idea [~jlewandowski] had with "if some config parameter is not in cql statement just merge the values from cassandra.yaml" because it is quite tricky to get that right. We would need to know what values were specfied and then diffing what is not there and then validating that such combination makes sense (and if it does not, should we fail otherwise valid CQL statement just because we happened to merge values from cassandra.yaml and that combination was not right? I do not think so). Let's just go with a simple case of "if compression is not specified just take the defaults from cassandra.yaml" rather then trying to merge the configs ... Too much of a hassle, might come as an improvement if somebody is really after that. I will try to come up with more tests and I think that sometimes next week this should be all completed and ready for review again. [CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed] {noformat} java17_pre-commit_tests ✓ j17_build 4m 7s ✓ j17_cqlsh_dtests_py311 7m 11s ✓ j17_cqlsh_dtests_py311_vnode 7m 18s ✓ j17_cqlsh_dtests_py386m 58s ✓ j17_cqlsh_dtests_py38_vnode 7m 1s ✓ j17_cqlshlib_cython_tests7m 26s ✓ j17_cqlshlib_tests 6m 50s ✓ j17_unit_tests 17m 36s ✓ j17_utests_oa 15m 39s ✕ j17_dtests 37m 48s scrub_test.TestScrub test_standalone_scrub_essential_files_only topology_test.TestTopology test_movement ✕ j17_dtests_latest 35m 36s offline_tools_test.TestOfflineTools test_sstableverify scrub_test.TestScrub test_standalone_scrub_essential_files_only configuration_test.TestConfiguration test_change_durable_writes ✕ j17_dtests_vnode35m 11s scrub_test.TestScrub test_standalone_scrub_essential_files_only ✕ j17_jvm_dtests 28m 15s ✕ j17_jvm_dtests_latest_vnode 27m 59s org.apache.cassandra.fuzz.harry.integration.model.ConcurrentQuiescentCheckerIntegrationTest testConcurrentReadWriteWorkload ✕ j17_utests_latest 14m 34s org.apache.cassandra.tcm.DiscoverySimulationTest discoveryTest java17_separate_tests java11_pre-commit_tests java11_separate_tests {noformat} [java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/a68e4fb0-bd7a-4758-841c-6b4b0fe22865] [java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/fa57a86d-d120-4304-bbdf-a6cf8fefc4d2] [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/91afc77c-54fe-4369-9cb9-ababa3568e16] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/71add260-9c68-4d87-9d5a-99863a01bb3f] was (Author: smiklosovic): Seems reasonably clean ... What I have not done is that the idea Jacek had with "if some config parameter is not in cql statement just merge the values from cassandra.yaml" because it is quite tricky to get that right. We would need to know what values were specfied and then diffing what is not there and then validating that such combination makes sense (and if it does not, should we fail otherwise valid CQL statement just because we happened to merge values from cassandra.yaml and that combination was not right? I do not think so). Let's just go with a simple case of "if compression is not specified just take the defaults from cassandra.yaml" rather then trying to merge the configs ... Too much of a hassle, might come as an improvement if somebody is really after that. I will try to come up with more tests and I think that sometimes next week this should be all completed and ready for review again. [CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed] {noformat} java17_pre-commit_tests ✓ j17_build 4m 7s ✓ j17_cqlsh_dtests_py311 7m 11s ✓ j17_cqlsh_dtests_py311_vnode 7m 18s ✓ j17_cqlsh_dtests_py386m 58s ✓ j1
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837237#comment-17837237 ] Alex Petrov edited comment on CASSANDRA-12937 at 4/15/24 1:08 PM: -- bq. Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. Some things are actually quite useful for gradual rollout. For example, compression. You probably do not want to rewrite your sstables across the entire cluster. Similar arguments may be made for canary deployments of memtable settings and other things. I agree that it is fine if these parameters are completely transient (i.e. if you have set it to something that diverges from the clusterwide value, it will get reverted back after the node bounce). In such case, probably they will not go through TCM and will be purely node-local. Examples of things that are now configuable via yaml but will be configurable via TCM if we go ahead with this proposal: partitioner, memtable configuration, default compaction strategy, compression. As Sam has mentioned, "which specific value makes it into schema just depends on which instance acts as the coordinator for a given DCL statement". bq. but I remain unconvinced that just picking the defaults from whatever node happens to be coordinating is the right way to go. I have talked with Sam shortly just to make sure I understand it correctly before trying to describe it. Since this was first worded in a way that suggested a problem but not directly proposed a solution (possibly described elsewhere), I will attempt to do this. Sam has already described a part of the solution as: bq. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). The part that was missing for me was where would the values be coming from, and what would be the precedence. When executing a {{CREATE}} statement on some node _without_ specifying, say, compression, the statement will be created and executed without the value for compression set at all. Every node will pick the value from its ephemeral parallel structure Sam described (which is also settable via JMX and alike like Stefan mentioned). If no value is present in this table, it will be picked from yaml (alternatively, we could just populate this structure from yaml, too, but I consider these things roughly equivalent). was (Author: ifesdjeen): bq. Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. Some things are actually quite useful for gradual rollout. For example, compression. You probably do not want to rewrite your sstables across the entire cluster. Similar arguments may be made for canary deployments of memtable settings and other things. I agree that it is fine if these parameters are completely transient (i.e. if you have set it to something that diverges from the clusterwide value, it will get reverted back after the node bounce). In such case, probably they will not go through TCM and will be purely node-local. Examples of things that are now configuable via yaml but will be configurable via TCM if we go ahead with this proposal: partitioner, memtable configuration, default compaction strategy, compression. As Sam has mentioned, "which specific value makes it into schema just depends on which instance acts as the coordinator for a given DCL statement". bq. but I remain unconvinced that just picking the defaults from whatever node happens to be coordinating is the right way to go. I have talked with Sam shortly just to make sure I understand it correctly before trying to describe it. Since this was first worded in a way that suggested a problem but not directly proposed a solution (possibly described elsewhere), I will attempt to do this. Sam has already described a part of the solution as: bq. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). The part that was missing for me was where would the values be coming from, and what would be the precedence. When executing a {CREATE} statement on some node _without_ specifying, say, compression, the statement wil
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837210#comment-17837210 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/15/24 12:06 PM: - {quote}It seems like if we actually want these to be cluster wide values and not configurable on a per-node basis the defaults themselves should be in TCM{quote} Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. However, who would be changing the defaults? What I mean by that is that if defaults are committed in TCM, then if we change our mind about the defaults, by what mean would we commit them into TCM again, now changed? was (Author: smiklosovic): {quote}It seems like if we actually want these to be cluster wide values and not configurable on a per-node basis the defaults themselves should be in TCM{quote} Yes, I think this is the most ideal solution. If somebody wants to experiment with a new compressor and similar, there would need to be some knob to override it, like some JMX method or similar, and all risks attached to that (divergence of the configuration caused by operator's negligence) would be on him. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837179#comment-17837179 ] Sam Tunnicliffe edited comment on CASSANDRA-12937 at 4/15/24 11:11 AM: --- The problem with that is that the defaults may be different on every instance, so what exactly should be stored in the TCM log? Ideally we should store the value that is actually resolved during initial execution on each node so that it can be re-used if/when the transformation is reapplied. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). was (Author: beobal): The problem with that is that the defaults may be different on every instance, so what exactly should be stored in the TCM log? Ideally we should store the value that is actually resolved during initial execution on each node to be persisted locally so that it can be re-used if/when the transformation is reapplied. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837176#comment-17837176 ] Jacek Lewandowski edited comment on CASSANDRA-12937 at 4/15/24 11:01 AM: - The problem with the failing test is probably that the default configuration for compression parameters (and other defaults for table / keyspace creation/alteration) should be part of the schema transformation data and stored in TCM log. This is it not an issues related to this ticket because it applies to various settings; For example, even without this PR, similar test would fail while manipulating a value of " cassandra.sstable_compression_default" property. Then, we would have the same problem with default compaction and memtable options which are also got from the configuration, was (Author: jlewandowski): The problem with the failing test is probably that the default configuration for compression parameters (and other defaults for table / keyspace creation/alteration) should be part of the schema transformation data and stored in TCM log. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831674#comment-17831674 ] Michael Semb Wever edited comment on CASSANDRA-12937 at 4/5/24 11:52 AM: - -+1 to [https://github.com/apache/cassandra/pull/3168]- EDIT: legit concerns raised below. was (Author: michaelsembwever): -+1 to https://github.com/apache/cassandra/pull/3168 - EDIT: legit concerns raised below. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831674#comment-17831674 ] Michael Semb Wever edited comment on CASSANDRA-12937 at 4/5/24 11:52 AM: - -+1 to https://github.com/apache/cassandra/pull/3168 - EDIT: legit concerns raised below. was (Author: michaelsembwever): +1 to https://github.com/apache/cassandra/pull/3168 > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:15 PM: --- I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we started to have system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be overwritten. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out loud about all the possible consequences in the future. was (Author: smiklosovic): I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we started to have system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out loud about all the possible consequences in the future. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:14 PM: --- I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we started to have system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out loud about all the possible consequences in the future. was (Author: smiklosovic): I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we started to have a system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out loud about all the possible consequences in the future. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression par
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:14 PM: --- I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we started to have a system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out loud about all the possible consequences in the future. was (Author: smiklosovic): I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we would start to have a system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out aloud about all the possible consequences in the future. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compress
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:13 PM: --- I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. However, just thinking out loud, what if we would start to have a system tables with different compressors? For now it is just same for every table. Right now, for system tables, we check if set compressor is equal to the default one. If we have "more defaults", then we would need to take care of this so default compressor(s) would not be a set. However, that means that we might probably change compressor for system tables as long as it is found among the default ones. We would need to probably keep track of what is the default compressor for a specific system table and error out if it is different. I do not think this is on the table for now, just thinking out aloud about all the possible consequences in the future. was (Author: smiklosovic): I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) -
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:09 PM: --- I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed table - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. was (Author: smiklosovic): I want to write this down explicitly ... It is not possible to alter system schema already, on CQL level. So preventing a user to change a compressor for system tables, even replicated ones, to some which is not the default is not possible already. What we want to prevent is to not be even able configure it via cassandra.yaml to something else. These are two different problems. I write this because as I go through the patch and polishing it, it broke a test which is changing compressors for each distributed tables - and system_auth is distributed, but it is system one. So changing the compressor on that failed because we are preventing that on code level while constructing TableMetadata, not just when we execute a respective CQL command. This might be a little bit surprising to see at first but I think it makes sense. We just prevent that from happening one "level" deeper, not only CQL but it is not possible to configure it programmatically too. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747123#comment-17747123 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/26/23 6:24 AM: I think we need to park this ticket for a while and return to it after 5.0 is out. There is clearly more work on this to be done. I was thinking about how to make it more robust, all in my head yet, hard to communicate, but we should spend little bit more time on this (how to exclude system keyspaces and all the related code and logic around that). Claude also came with an idea to make it able to specify compression _per keyspace_ and that would shuffle quite a lot of stuff - configuration wise. Also, if we say what should have the default compression ... why _just compression?_ Why we could not support e.g. compaction as well? One might say that he wants to compact all tables in this keyspace by UnifiedCompactionStrategy and tables in another keyspace by TWCS, for example. So we see that this might be done a little bit more robustly to not just wire it for compression but to allow the code to be "plastic" enough to make a room for similar improvements in the future. was (Author: smiklosovic): I think we need to park this ticket for a while and return to it after 5.0 is out. There is clearly more work on this to be done. I was thinking about how to make it more robust, all in my head yet, hard to communicate, but we should spend little bit more time on this (how to exclude system keyspaces and all the related code and logic around that). Claude also came with an idea to make it able to specify compression _per keyspace_ and that would shuffle quite a lot of stuff - configuration wise. Also, if we say what should be the default compression ... why _just compression?_ Why we could not support e.g. compaction as well? One might say that he wants to compact all tables in this keyspace by UnifiedCompactionStrategy and tables in another keyspace by TWCS, for example. So we see that this might be done a little bit more robustly to not just wire it for compression but to allow the code to be "plastic" enough to make a room for similar improvements in the future. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 7h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:30 AM: - I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a custom compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. If we are not comfortable with that, we can check in DataComponent where compressionParams are resolved to always flush with LZ4 when flushing system tables, no matter what sstable_compressor is set to. I think that that is safer to do. LZ4 is battle-tested. I am not completely sure that a user knows that if he provides his fast compressor it is going to be used for flushing the system tables as well. That is rather a delicate matter ... was (Author: smiklosovic): I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a custom compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks a
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:17 AM: - I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a custom compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. was (Author: smiklosovic): I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a custom compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the default compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the default compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:16 AM: - I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a custom compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the default compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the default compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. was (Author: smiklosovic): I want to highlight one not-so-obvious consequence of introducing this patch. There is flush_compression property in cassandra.yaml doing this: {code} # Compression to apply to SSTables as they flush for compressed tables. # Note that tables without compression enabled do not respect this flag. # # As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially # block flushes for too long, the default is to flush with a known fast # compressor in those cases. Options are: # # none : Flush without compressing blocks but while still doing checksums. # fast : Flush with a fast compressor. If the table is already using a #fast compressor that compressor is used. # table: Always flush with the same compressor that the table uses. This #was the pre 4.0 behavior. # # flush_compression: fast {code} By default, "fast" compressor is LZ4, it will use that one. So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one does "nodetool snapshot", it will do snapshots of all tables, user defined as well as system tables and it is compressed with LZ4. If one specifies a default compressor to be used in newly introduced "sstable_compression" field, two situations might happen: 1) the default compressor is not "fast" (check ICompressor.Uses and ICompressor.recommendedUses()), that means that we need to fallback to a fast compressor - it will default to LZ4 2) the default compressor is fast - so we are going to flush *system* tables with a *custom, user-specified, ICompressor*. I want to be sure that we are on the same page. While it makes sense to do it like that - a user specified a custom compressor to use - that also means that system tables will be compressed with that compressor as well. I want to be sure everybody is OK with this. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 4h 50m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Addit
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729083#comment-17729083 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 6/4/23 5:18 PM: --- [~claude] we got the second review from Mick here (1). Feel free to take my branch and base further work on top of that. I think we should consolidate all the branches and continue to work just on one and discard the rest. (1) https://github.com/apache/cassandra/pull/2282#issuecomment-1575564266 was (Author: smiklosovic): [~claude] we got a second review from Mick here (1). Feel free to take my branch and base further work on top of that. I think we should consolidate all the branches and continue to work just on one and discard the rest. (1) https://github.com/apache/cassandra/pull/2282#issuecomment-1575564266 > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 50m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728384#comment-17728384 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 6/1/23 8:47 PM: --- I finally fixed it all. Phew, what a ride! I reorganized the branch in such a way that there is one commit which implements it and another one where tests are done so it is nicely visible and easier to contemplate about. I think we are finally in a good shape to have a look of the second committer. [~mck] would you take a look, please? [~claude] no worries you will be still author of the PR, I do that upon actual commit once we get there, I just squashed / reorganized the commits little bit for now. PR: https://github.com/apache/cassandra/pull/2282 j8 https://app.circleci.com/pipelines/github/instaclustr/cassandra/2353/workflows/aa9ee549-ec99-455f-8e89-b325e013ff8e j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/2354/workflows/37b9da05-22ce-41ef-8a1b-753e3301c6eb was (Author: smiklosovic): I finally fixed it all. Phew, what a ride! I reorganized the branch in such a way that there is one commit which implements it and another one where tests are done so it is nicely visible and easier to contemplate about. I think we are finally in a good shape to have a look of the second committer. [~mck] would you take a look, please? [~claude] no worries you will be still author of the PR, I do that upon actual commit once we get there, I just squashed / reorganized the commits little bit for now. PR: https://github.com/apache/cassandra/pull/2282 j8 https://app.circleci.com/pipelines/github/instaclustr/cassandra/2353/workflows/aa9ee549-ec99-455f-8e89-b325e013ff8e > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3.5h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727181#comment-17727181 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/29/23 7:03 PM: this is j11 build https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1 I would focus on the failed units first: https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1/jobs/42564/tests https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937 was (Author: smiklosovic): this is j11 build https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1 I would focus on the failed units first: https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1/jobs/42564/tests > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725115#comment-17725115 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/22/23 8:28 PM: There are few hundreds of errors in tests https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2471/#showFailuresLink It all seems to fail on "Unknown compression options: ([sstable_compression])" We stopped to use this as it was deprecated so we removed it but tests are still using it. was (Author: smiklosovic): There are few hundreds of errors in tests https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2471/#showFailuresLink It all seem to fail on "Unknown compression options: ([sstable_compression])" We stopped to use this as it was deprecated so we removed it but tests are still using it. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719227#comment-17719227 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/4/23 9:23 AM: --- I did another round of review: When I do this {code:java} commitlog_compression: - class_name: lz4 parameters: - enable: "true" lz4_compressor_type: "fast" {code} This does not work: {code:java} Caused by: org.apache.cassandra.exceptions.ConfigurationException: Could not create Compression for type org.apache.cassandra.io.compress.lz4 at org.apache.cassandra.schema.CompressionParams.parseCompressorClass(CompressionParams.java:338) at org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:396) at org.apache.cassandra.db.commitlog.CommitLog$Configuration.(CommitLog.java:630) at org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:107) at org.apache.cassandra.db.commitlog.CommitLog.construct(CommitLog.java:92) at org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:77) {code} I would expect similar errors to be found when trying aliases for other _compression configuration properties. Next, the default configuration for sstable_compression which is currently this does not work: {code:java} sstable_compression: # - class_name: lz4 # parameters: # - enable: true # chunk_length: 16KiB # min_compress_ratio: 0.0 # max_comrpessed_length: 16KiB # class_specific_parameter: value {code} When I uncomment it like this: {code:java} sstable_compression: - class_name: lz4 parameters: - enable: true chunk_length: 16KiB min_compress_ratio: 0.0 max_comrpessed_length: 16KiB class_specific_parameter: value {code} First of all it says that: {code:java} Caused by: java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.CharSequence at org.apache.cassandra.schema.CompressionParams.copyOptions(CompressionParams.java:406) at org.apache.cassandra.schema.CompressionParams.fromParameterizedClass(CompressionParams.java:160) at org.apache.cassandra.schema.CompressionParams.defaultParams(CompressionParams.java:150) at org.apache.cassandra.schema.TableParams$Builder.(TableParams.java:353) at org.apache.cassandra.schema.TableParams.builder(TableParams.java:119) {code} This means that it expects values to be strings, so I do it like this: {code:java} sstable_compression: - class_name: lz4 parameters: - enable: "true" chunk_length: "16KiB" min_compress_ratio: "0.0" max_comrpessed_length: "16KiB" class_specific_parameter: "value" {code} But it says: {code:java} Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unknown compression options enable at org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:358) at org.apache.cassandra.schema.CompressionParams.(CompressionParams.java:260) at org.apache.cassandra.io.compress.CompressionMetadata.open(CompressionMetadata.java:94) ... 15 common frames omitted {code} I think this has to be changed to "enabled" as it is "enabled" in trunk too. Next, it does not seem to me that the value of "lz4" is picked up because if I do this: {code:java} sstable_compression: - class_name: lz4 parameters: - enabled: "true" chunk_length: "16KiB" min_compress_ratio: "0.0" max_comrpessed_length: "16KiB" class_specific_parameter: "value" {code} it does not say that the "class_specific_parameter" is invalid, check this: when I do {code:java} sstable_compression: - class_name: LZ4Compressor parameters: - enabled: "true" chunk_length: "16KiB" min_compress_ratio: "0.0" max_comrpessed_length: "16KiB" class_specific_parameter: "value" {code} it throws: {code:java} Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unknown compression options class_specific_parameter at org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:358) at org.apache.cassandra.schema.CompressionParams.lambda$fromClassAndOptions$0(CompressionParams.java:228) at org.apache.cassandra.schema.CompressionParams.fromClassAndOptions(CompressionParams.java:229) at org.apache.cassandra.schema.CompressionParams.fromParameterizedClass(CompressionParams.java:160) at org.apache.cassandra.schema.CompressionParams.defaultParams(CompressionParams.java:150) at org.apache.cassandra.schema.TableParams$Builder.(TableParams.java:353) at org.apache.cassandra.schema.TableParams.builder(TableParams.java:119) at org.apache.cassandra.cql3.statements.schema.TableAttributes.validate(TableAttributes.java:60) at org.apac
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714856#comment-17714856 ] Claude Warren edited comment on CASSANDRA-12937 at 4/21/23 1:57 PM: [~smiklosovic], [~mck] Please review [Github Pull Request #2254|https://github.com/apache/cassandra/pull/2254] there are a number of changes. * switched to ParameterizedClass for yaml configuration * ensured processing to support CQL compression parameters * added extensive yaml file documentation. * moved methods used only in testing from CompressionParams to a new TestingCompressionParamsFactory class. * updated tests to use TestingCompressionFactory. * added extensive testing to ensure that all combinations of Map and ParameterizedClass construct the same CompressionParams. * added additional testing coverage for pre-existing methods * unified error reporting so that the same error on different paths reports with the same text. was (Author: claudenw): [~smiklosovic], [~mck] Please review [Github Pull Request #2254|https://github.com/apache/cassandra/pull/2254] there are a number of changes. * switched to ParameterizedClass for yaml configuration * ensured processing to support CQL compression parameters * added extensive yaml file documentation. * moved methods used only in testing from CompressionParams to a new TestingCompressionParamsFactory class. * updated tests to use TestingCompressionFactory. * added extensive testing to ensure that all combinations of Map and ParameterixedClass construct the same CompressionParams. * added additional testing coverage or pre-existing methods * unified error reporting so that the same error on different paths reports with the same text. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:41 PM: https://github.com/apache/cassandra/pull/2282 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "64KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "1MiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2282 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "64KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:39 PM: https://github.com/apache/cassandra/pull/2282 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "64KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2282 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:36 PM: https://github.com/apache/cassandra/pull/2282 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:22 PM: https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" parameters: - chunk_length_in_kb: "32" min_compress_ratio: "0" {code} {code} sstable_compression: - class_name: "lz4" parameters: - chunk_length: "32KiB" min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:20 PM: https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. 5. aliases supported {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:19 PM: https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. I am not sure I covered all parameters but I expect that this might be done in a similar fashion. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:17 PM: https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported (as well as old) 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. was (Author: smiklosovic): https://github.com/apache/cassandra/pull/2281 1. flat map 2. ParameterizedClass - same stuff as everywhere 3. new format of values supported 4. some parameters / their names were deprecated in 3.0 so they can be removed in 5.0. {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length: "32KiB" # min_compress_ratio: "0" {code} {code} #sstable_compression: # - class_name: "org.apache.cassandra.io.compress.LZ4Compressor" #parameters: #- chunk_length_in_kb: "32" # min_compress_ratio: "0" {code} All this works. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713571#comment-17713571 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 12:25 PM: - I again do not have any clear answer why we are "extracting" something and then we have extra parameterized class with further options. _Currently CompressionParams takes the class name and the parameters. It extracts chunk_length_kb (or chunk_length_in_kb) and min_compress_ratio from the parameters and uses them to build the CompressionParams instance._ Do not you just try to copy the same approach as in "CompressionParams extracts these parameters so we need to apply same extraction in cassandra.yaml"? Why is the extraction like it is suggested important? Why we can not flatten the configuration? It is quite questionable why we are having nested sections like that when, from ux perspective, it is truly just a map. The fact that we are doing something internally in some fashion does not mean that it has to manifest into the configuration in cassandra.yaml. If we want to support same values as in other places in cassandra.yaml but have nice flat map (preferably), would not it be possible to translate these values internally into the old values? The most idealistic option would be to start to support the same format of values for parameters in cassandra.yaml in CQL as well. I do not think that without further discussion how this should be modeled and reaching broader consensus this ticket is close to the actual merging. was (Author: smiklosovic): I again do not have any clear answer why we are "extracting" something and then we have extra parameterized class with further options. _Currently CompressionParams takes the class name and the parameters. It extracts chunk_length_kb (or chunk_length_in_kb) and min_compress_ratio from the parameters and uses them to build the CompressionParams instance._ Do not you just try to copy the same approach as in "CompressionParams extracts these parameters so we need to apply same extraction in cassandra.yaml"? Why is the extraction like it is suggested important? Why we can not flatten the configuration? It is quite questionable why we are having nested sections like that when, from ux perspective, it is truly just a map. The fact that we are doing something internally in some fashion does not mean that it has to manifest into the configuration in cassandra.yaml. If we want to support same values as in other places in cassandra.yaml but have nice flat map (preferably), would not it be possible to translate these values internally into the old values? Or even better, would it be possible to have new value parameters in cassandra.yaml but they would be transformed internally into the old values? The most idealistic option would be to start to support the same format of values for parameters in cassandra.yaml in CQL as well. I do not think that without further discussion how this should be modeled and reaching broader consensus this ticket is close to the actual merging. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the ne
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713551#comment-17713551 ] Michael Semb Wever edited comment on CASSANDRA-12937 at 4/18/23 11:23 AM: -- I would rather see consistency in the cassandra.yaml Please use parameter names and unit as is the new style in the yaml. It does not need to match the cql style. If it is needed to have an extra class to support this, it is not a headache or an obsession. I don't have any opinion about whether the extra top-level options (chunk_length, maxCompressedLength, minCompressRatio) should be above the parameter map or in it. I do see the issue with having to transform names and values inside the map as being clumsy and potentially error prone. Neither hints_compression or commitlog_compression supports customising the chunk_length AFAIK. was (Author: michaelsembwever): I would rather see consistency in the cassandra.yaml Please use parameter names and unit as is the new style in the yaml. It does not need to match the cql style. If it is needed to have an extra class to support this, it is not a headache or an obsession. I also don't have any opinion about whether the extra top-level options (chunk_length, maxCompressedLength, minCompressRatio) should be above the parameter map or in it. I do see the issue with having to transform names and values inside the map as being clumsy and potentially error prone. Neither hints_compression or commitlog_compression supports customising the chunk_length AFAIK. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:42 AM: - All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which exists nowhere else with extracting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. Why do we want to change all of this to further confuse the user? EDIT: to further support my case with having same parameters and their units in cassandra.yaml as they are specified in CQL upon table creation, what happens in practice is that people who want to take advantage of this configuration would just copy-paste CQL snippet for compression params and they would make it like entries in the map by hitting "enter" on the keyboard and they are done. I highly doubt that they would like to specify "other units" just for the sake of consistency with the rest of cassandra.yaml. I do not think they care at all. They just want to copy it over from CQL and call it the day. was (Author: smiklosovic): All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which exists nowhere else with extracting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. Why do we want to change all of this to further confuse the user? > Default setting (yaml) for SSTable compression >
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:22 AM: - All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which exists nowhere else with extracting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. Why do we want to change all of this to further confuse the user? was (Author: smiklosovic): All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which exists nowhere else with extracting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. We do we want to change all of this to further confuse the user? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:21 AM: - All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which exists nowhere else with extracting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. We do we want to change all of this to further confuse the user? was (Author: smiklosovic): All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which nowhere else with exacting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. We do we want to change all of this to further confuse the user? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > Th
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:20 AM: - All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. (or following same units). _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which nowhere else with exacting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. We do we want to change all of this to further confuse the user? was (Author: smiklosovic): All I prefer to see is to have a simple map of parameters into ParametrizedClass which would have exactly same names as for their CQL counterparts. They would be literally just used there. There does not seem to be any collisions with that. I do not get the "obsession" with having parameters for these compressors to follow the same names of CompressionParams. _The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass._ why do they have to be extracted in the first place? for hints_compression in yaml we have: {code} # Compression to apply to the hint files. If omitted, hints files # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. #hints_compression: # - class_name: LZ4Compressor # parameters: # - {code} For commitlog_compression we have: {code} # Compression to apply to the commit log. If omitted, the commit log # will be written uncompressed. LZ4, Snappy, and Deflate compressors # are supported. # commitlog_compression: # - class_name: LZ4Compressor # parameters: # - {code} for sstable_compression, I would prefer to see the exact same way of the configuration. Why are we trying to introduce completely custom way of the configuration which nowhere else with exacting some parameters outside? Why we can not use same stuff? I do not think that we should blindly follow "the parameters names and their units". I think we already discussed this. I already explained all advantages of following what we have there already. If we make it explicitly clear that these parameters are exactly same as if they would be put into compression params upon table creation, they would save us a lot of headache to have something completely custom and people would need to put there parameters and their names as they are used to. We do we want to change all of this to further confuse the user? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713524#comment-17713524 ] Claude Warren edited comment on CASSANDRA-12937 at 4/18/23 10:09 AM: - hints_compression and commitlog_compression use the standard ParameterizedClass. The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass. The parameters in CompressionParams are {code:java} private final int chunkLength; private final int maxCompressedLength; // In content we store max length to avoid rounding errors causing compress/decompress mismatch. private final double minCompressRatio; // In configuration we store min ratio, the input parameter. {code} The ParameterizedClass constructor that accepts the Map of options expects a key of "chunk_length_in_kb" or "chunk_length_kb" as well as a "min_compress_ratio". This change I made does not change the hints_compression or commitlog_compression options. The yaml file has an additional set of requirements: * The chunkLength (yaml: chunk_length) should be specified with the DataStorageSpec suffix (e.g. KiB). * The maxCompressedLength should be accepted as a parameter. * The maxCompressedLength (yaml: max_compressed_length) should be specified with the DataStorageSpec suffix (e.g. KiB). * maxCompressedLength and minCompressRatio are related to each other via chunk_length; so only one can be specified. I could work chunkLength and maxCompressedLength into the class_name parameters, however, I believe this will result in adding 2 more reserved words both of which will need to be removed from the parameter list. This change will affect all CompressionParams constructions that use the Map format. I will make the change with the following processes for determining collision values: * If both max_compressed_length and min_compress_ratio are specified an ConfigurationException will be thrown. * if both chunk_length and either chunk_length_in_kb or chunk_length_kb are specified and they are not equal ConfiguraitonException will be thrown. * if chunk_length or max_compressed_length are specified and do not use the DataStorageSpec suffix a ConfigurationException will be thrown I will also ensure that the short names: lz4, none, noop, snappy, deflate, and zstd will work as class names and use the defaults specified by the CompressionParams methods of the same names. was (Author: claudenw): hints_compression and commitlog_compression use the standard ParameterizedClass. The CompressionParams has 3 parameters that it extracts or creates from the parameters in the ParameterizedClass. The parameters in CompressionParams are {code:java} private final int chunkLength; private final int maxCompressedLength; // In content we store max length to avoid rounding errors causing compress/decompress mismatch. private final double minCompressRatio; // In configuration we store min ratio, the input parameter. {code} The ParameterizedClass constructor that accepts the Map of options expects a key of "chunk_length_in_kb" or "chunk_length_kb" as well as a "min_compress_ratio". This change I made does not change the hints_compression or commitlog_compression options. The yaml file has an additional set of requirements: * The chunkLength (yaml: chunk_length) should be specified with the DataStorageSpec suffix (e.g. KiB). * The maxCompressedLength should be accepted as a parameter. * The maxCompressedLength (yaml: max_compressed_length) should be specified with the DataStorageSpec extensions (e.g. KiB). * maxCompressedLength and minCompressRatio are related to each other via chunk_length; so only one can be specified. I could work chunkLength and maxCompressedLength into the class_name parameters, however, I believe this will result in adding 2 more reserved words both of which will need to be removed from the parameter list. This change will affect all CompressionParams constructions that use the Map format. I will make the change with the following processes for determining collision values: * If both max_compressed_length and min_compress_ratio are specified an ConfigurationException will be thrown. * if both chunk_length and either chunk_length_in_kb or chunk_length_kb are specified and they are not equal ConfiguraitonException will be thrown. * if chunk_length or max_compressed_length are specified and do not use the DataStorageSpec suffix a ConfigurationException will be thrown I will also ensure that the short names: lz4, none, noop, snappy, deflate, and zstd will work as class names and use the defaults specified by the CompressionParams methods of the same names. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 >
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713056#comment-17713056 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 12:24 PM: - Why do you insist on this: {code} sstable_compressor: chunk_length: 16KiB min_compress_ratio: 0.0 class_name: org.apache.cassandra.io.compress.LZ4Compressor parameters: - param1 : value {code} Instead of doing like this: {code} sstable_compression: - class_name: org.apache.cassandra.io.compress.LZ4Compressor parameters: - param1: "value1" whateverParams . {code} was (Author: smiklosovic): Why do you insist on this: {code} sstable_compressor: chunk_length: 16KiB min_compress_ratio: 0.0 class_name: org.apache.cassandra.io.compress.LZ4Compressor parameters: - param1 : value {code} Instead of doing like this: {code} sstable_compression: - class_name: org.apache.cassandra.io.compress.LZ4Compressor parameters: - param1: "value1" whateverParams . {code} > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713054#comment-17713054 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 12:15 PM: - I do not understand. There might be no collisions whatsoever if we make sstable_compression in Config of type ParametrizedClass as mentioned above. was (Author: smiklosovic): I do not understand. There might be no collisions whatsoever if we make sstable_comression in Config of type ParametrizedClass as mentioned above. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713027#comment-17713027 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 11:09 AM: - Are we talking about this? https://github.com/apache/cassandra/pull/2254/files There is still SSTableCompressionOptions in Config. Am I reviewing the correct branch? This one seems to have SSTableCompressionOptions in Config too. https://github.com/apache/cassandra/pull/2199 was (Author: smiklosovic): Are we talking about this? https://github.com/apache/cassandra/pull/2254/files There is still SSTableCompressionOptions in Config. Am I reviewing the correct branch? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710831#comment-17710831 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/11/23 9:01 AM: Great [~claude], I ll try to finish that soonish and it would be great if you participated in the review. Please tell me if you want to do that other way around. was (Author: smiklosovic): Great [~claude], I ll try to finish that soonish and it would be great if you participated in the review. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 3h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709261#comment-17709261 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/6/23 7:45 AM: --- I think being consistent with CQL as well as be prepared for the future so we can specify any compressor without changing anything is more important. To summarise: 1) consistency with CQL 2) zero learning curve, a user just puts there what he is used to 3) any future in-built compressor supported out of the box so we do not need to think about it and we do not need to change other enums, switches etc to support that 4) custom compressor supported as well, we do not do any differences between "in-built and custom" in yaml. It is transparent. 5) any parameters possible to add 6) uses same code path for getting CompressorParams, no new classes and boiler-plate code necessary 7) One way of specifying in-built compressor as well as the custom one, we do not need to do any difference between them 8) We can configure every single parameter of a compressor, not only that helper creation functions offer us I really think that all these points in total beat the argument that we need to have parameters in "so and so format". If we are transparent about the fact that what is used in CQL is accepted in sstable_compressor map, it is really a no-brainer. [~mck] what do you think? was (Author: smiklosovic): I think being consistent with CQL as well as be prepared for the future so we can specify any compressor without changing anything is more important. To summarise: 1) consistency with CQL 2) zero learning curve, a user just puts there what he is used to 3) any future in-built compressor supported out of the box so we do not need to think about it and we do not need to change other enums, switches etc to support that 4) custom compressor supported as well, we do not do any differences between "in-built and custom" in yaml. It is transparent. 5) any parameters possible to add 6) uses same code path for getting CompressorParams, no new classes and boiler-plate code necessary 7) One way of specifying in-built compressor as well as the custom one, we do not need to do any difference between them 8) We can configure every single parameter of a compressor, not only that helper creation functions offer us I really think that all these points in total beats the argument that we need to have parameters in "so and so format". If we are transparent about the fact that what is used in CQL is accepted in sstable_compressor map, it is really a no-brainer. [~mck] what do you think? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709261#comment-17709261 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/6/23 7:44 AM: --- I think being consistent with CQL as well as be prepared for the future so we can specify any compressor without changing anything is more important. To summarise: 1) consistency with CQL 2) zero learning curve, a user just puts there what he is used to 3) any future in-built compressor supported out of the box so we do not need to think about it and we do not need to change other enums, switches etc to support that 4) custom compressor supported as well, we do not do any differences between "in-built and custom" in yaml. It is transparent. 5) any parameters possible to add 6) uses same code path for getting CompressorParams, no new classes and boiler-plate code necessary 7) One way of specifying in-built compressor as well as the custom one, we do not need to do any difference between them 8) We can configure every single parameter of a compressor, not only that helper creation functions offer us I really think that all these points in total beats the argument that we need to have parameters in "so and so format". If we are transparent about the fact that what is used in CQL is accepted in sstable_compressor map, it is really a no-brainer. [~mck] what do you think? was (Author: smiklosovic): I think being consistent with CQL as well as be prepared for the future so we can specify any compressor without changing anything is more important. To summarise: 1) consistency with CQL 2) zero learning curve, a user just puts there what he is used to 3) any future in-built compressor supported out of the box so we do not need to think about it and we do not need to change other enums, switches etc to support that 4) custom compressor supported as well 5) any parameters possible to add 6) uses same code path for getting CompressorParams, no new classes and boiler-plate code necessary 7) One way of specifying in-built compressor as well as the custom one, we do not need to do any difference between them 8) We can configure every single parameter of a compressor, not only that helper creation functions offer us I really think that all these points in total beats the argument that we need to have parameters in "so and so format". If we are transparent about the fact that what is used in CQL is accepted in sstable_compressor map, it is really a no-brainer. [~mck] what do you think? > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709226#comment-17709226 ] Claude Warren edited comment on CASSANDRA-12937 at 4/6/23 6:53 AM: --- Looking at CompressionParams there are a number of default configurations e.g. snappy(), lz4(), and noCompression() that I thought would be in common use. What I wanted to do was to provide an easy way to get to call those methods as well as provide the ability to load any compressor via the map. Also, early on the idea of putting chunk_length_in_kb was rejected with the "16KiB" form for the input requested. If it is agreed to remove the shortcuts and use the simple map form with the parameters I'll make those changes. I did come across a note that says that configuration file and CQL use different parameters for compression, thus I onluy implemented min_compress_ratio and used it to calculate max_compression_length. So I got to where the code is by trying to support the defaults in CompressonParams and following the min_compress_ratio not max_compression_length in the config files. you can configure the ztsd with 12Kib chunks by setting: {code:java} sstable_compressor: chunk_length: 12KiB type: zstd {code} was (Author: claudenw): Looking at CompressionParams there are a number of default configurations e.g. snappy(), lz4(), and noCompression() that I thought would be in common use. What I wanted to do was to provide an easy way to get to call those methods as well as provide the ability to load any compressor via the map. Also, early on the idea of putting chunk_length_in_kb was rejected with the "16KiB" form for the input requested. If it is agreed to remove the shortcuts and use the simple map form with the parameters I'll make those changes. I did come across a note that says that configuration file and CQL use different parameters for compression, thus I onluy implemented min_compress_ratio and used it to calculate max_compression_length. So I got to where the code is by trying to support the defaults in CompressonParams and following the min_compress_ratio not max_compression_length in the config files. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709083#comment-17709083 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/5/23 7:39 PM: --- I did it here https://github.com/apache/cassandra/pull//files The configuration is as simple as this: {code} sstable_compressor: class: "org.apache.cassandra.io.compress.LZ4Compressor" chunk_length_in_kb: "16" min_compress_ratio: "0" {code} Since sstable_compressor is a map, it may contain whatever parameters. The creation of compressor / validation is done upon node's startup. This solution is prepared for whatever compressor, whatever parameters and it accepts same parameters as specified in CQL so there is nothing new to learn. was (Author: smiklosovic): I did it here https://github.com/apache/cassandra/pull//files The configuration is as simple as this: {code} sstable_compressor: class: "org.apache.cassandra.io.compress.LZ4Compressor" chunk_length_in_kb: "16" min_compress_ratio: "0" {code} Since sstable_compressor is a map, it may contain whatever parameters. The creation of compressor / validation is done upon node's startup. This solution is prepared for whatever compressor, whatever parameters and it accepts same parameters as specified in CQL so there nothing new to learn. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709025#comment-17709025 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/5/23 5:25 PM: --- also clean j11 [https://app.circleci.com/pipelines/github/instaclustr/cassandra/2055/workflows/3ec6e5cf-36bf-45aa-a794-27a88a1ee0de] [~mck] would you take a look? on this branch, please? [https://github.com/apache/cassandra/pull/] was (Author: smiklosovic): also clean j11 [https://app.circleci.com/pipelines/github/instaclustr/cassandra/2055/workflows/3ec6e5cf-36bf-45aa-a794-27a88a1ee0de] [~mck] would you take a look? on this branch, please? [https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937] > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708537#comment-17708537 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/23 6:44 PM: --- I added fixes here (1), exactly this commit (2). The code was not compilable, it was failing on ant rat. Also, it seems to me that you used Java 9 / 11 as String.isBlank() is not in Java 8 yet so it failed to compile it. There are also various formatting improvements etc. I am building it as we speak. (1) https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937 (2) https://github.com/instaclustr/cassandra/commit/95422f915fb30c27e4691fbc5711b3361d0331a3 You are welcome to git cherry-pick this commit on top of your branch or we will just ship my branch (squashed with you as author). was (Author: smiklosovic): I added fixes here (1), exactly this commit (2). The code was not compilable, it was failing on ant rat. Also, it seems to me that you used Java 9 / 11 as String.isBlank() is not in Java 8 yet to it failed to compile it. There are also various formatting improvements etc. I am building it as we speak. (1) https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937 (2) https://github.com/instaclustr/cassandra/commit/95422f915fb30c27e4691fbc5711b3361d0331a3 You are welcome to git cherry-pick this commit on top of your branch or we will just ship my branch (squashed with you as author). > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701098#comment-17701098 ] Stefan Miklosovic edited comment on CASSANDRA-12937 at 3/16/23 11:26 AM: - [~claude] would you mind to rework that PR against current trunk? I am getting a lot of conflicts. was (Author: smiklosovic): [~claude] would you mind to rework that PR against trunk? I am getting a lot of conflicts. This is a new feature and should be delivered in 5.0 first. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Claude Warren >Priority: Low > Labels: AdventCalendar2021, lhf > Fix For: 5.x > > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org