[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-05-28 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850195#comment-17850195
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/28/24 10:12 PM:
-

I've hardened the path little bit and added few tests.

[CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build 4m 3s
  ✓ j17_cqlsh_dtests_py311   7m 13s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 37s
  ✓ j17_cqlsh_dtests_py386m 52s
  ✓ j17_cqlsh_dtests_py38_vnode  7m 21s
  ✓ j17_cqlshlib_cython_tests7m 56s
  ✓ j17_cqlshlib_tests   6m 46s
  ✓ j17_jvm_dtests_latest_vnode 27m 54s
  ✓ j17_unit_tests  14m 44s
  ✓ j17_utests_latest15m 3s
  ✕ j17_dtests  37m 42s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  topology_test.TestTopology test_movement
  ✕ j17_dtests_latest   35m 24s
  offline_tools_test.TestOfflineTools test_sstableverify
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  ✕ j17_dtests_vnode36m 15s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  ✕ j17_jvm_dtests  29m 10s
  
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest 
testOptionalMtlsModeDoNotAllowNonSSLConnections
  
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest 
testEndpointVerificationEnabledIpNotInSAN
  ✕ j17_utests_oa17m 8s
  org.apache.cassandra.db.compaction.CompactionStrategyManagerTest 
testAutomaticUpgradeConcurrency
java17_separate_tests
java11_pre-commit_tests 
java11_separate_tests
{noformat}

[java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/9fbc3590-1168-41f8-a7c8-a3fbb3dfc0b0]
[java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/d2e65942-b99e-4927-bd65-85800e9d94e9]
[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/df51197e-c92e-454f-9c75-2f5eaee43bb8]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/6df36e29-d2cd-4838-b3b8-69e9113b295f]



was (Author: smiklosovic):
[CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build 4m 3s
  ✓ j17_cqlsh_dtests_py311   7m 13s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 37s
  ✓ j17_cqlsh_dtests_py386m 52s
  ✓ j17_cqlsh_dtests_py38_vnode  7m 21s
  ✓ j17_cqlshlib_cython_tests7m 56s
  ✓ j17_cqlshlib_tests   6m 46s
  ✓ j17_jvm_dtests_latest_vnode 27m 54s
  ✓ j17_unit_tests  14m 44s
  ✓ j17_utests_latest15m 3s
  ✕ j17_dtests  37m 42s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  topology_test.TestTopology test_movement
  ✕ j17_dtests_latest   35m 24s
  offline_tools_test.TestOfflineTools test_sstableverify
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  ✕ j17_dtests_vnode36m 15s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  ✕ j17_jvm_dtests  29m 10s
  
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest 
testOptionalMtlsModeDoNotAllowNonSSLConnections
  
org.apache.cassandra.distributed.test.NativeTransportEncryptionOptionsTest 
testEndpointVerificationEnabledIpNotInSAN
  ✕ j17_utests_oa17m 8s
  org.apache.cassandra.db.compaction.CompactionStrategyManagerTest 
testAutomaticUpgradeConcurrency
java17_separate_tests
java11_pre-commit_tests 
java11_separate_tests
{noformat}

[java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4358/workflows/9fbc3590-1168-41f8-a7c8-a3fbb3dfc0b0]
[java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassand

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-05-24 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849218#comment-17849218
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/24/24 8:36 AM:


Seems reasonably clean ... 

What I have not done is that the idea [~jlewandowski] had with "if some config 
parameter is not in cql statement just merge the values from cassandra.yaml" 
because it is quite tricky to get that right. We would need to know what values 
were specfied and then diffing what is not there and then validating that such 
combination makes sense (and if it does not, should we fail otherwise valid CQL 
statement just because we happened to merge values from cassandra.yaml and that 
combination was not right? I do not think so).

Let's just go with a simple case of "if compression is not specified just take 
the defaults from cassandra.yaml" rather then trying to merge the configs ... 
Too much of a hassle, might come as an improvement if somebody is really after 
that.

I will try to come up with more tests and I think that sometimes next week this 
should be all completed and ready for review again.

[CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build 4m 7s
  ✓ j17_cqlsh_dtests_py311   7m 11s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 18s
  ✓ j17_cqlsh_dtests_py386m 58s
  ✓ j17_cqlsh_dtests_py38_vnode   7m 1s
  ✓ j17_cqlshlib_cython_tests7m 26s
  ✓ j17_cqlshlib_tests   6m 50s
  ✓ j17_unit_tests  17m 36s
  ✓ j17_utests_oa   15m 39s
  ✕ j17_dtests  37m 48s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  topology_test.TestTopology test_movement
  ✕ j17_dtests_latest   35m 36s
  offline_tools_test.TestOfflineTools test_sstableverify
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  configuration_test.TestConfiguration test_change_durable_writes
  ✕ j17_dtests_vnode35m 11s
  scrub_test.TestScrub test_standalone_scrub_essential_files_only
  ✕ j17_jvm_dtests  28m 15s
  ✕ j17_jvm_dtests_latest_vnode 27m 59s
  
org.apache.cassandra.fuzz.harry.integration.model.ConcurrentQuiescentCheckerIntegrationTest
 testConcurrentReadWriteWorkload
  ✕ j17_utests_latest   14m 34s
  org.apache.cassandra.tcm.DiscoverySimulationTest discoveryTest
java17_separate_tests
java11_pre-commit_tests 
java11_separate_tests
{noformat}

[java17_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/a68e4fb0-bd7a-4758-841c-6b4b0fe22865]
[java17_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/fa57a86d-d120-4304-bbdf-a6cf8fefc4d2]
[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/91afc77c-54fe-4369-9cb9-ababa3568e16]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4350/workflows/71add260-9c68-4d87-9d5a-99863a01bb3f]



was (Author: smiklosovic):
Seems reasonably clean ... 

What I have not done is that the idea Jacek had with "if some config parameter 
is not in cql statement just merge the values from cassandra.yaml" because it 
is quite tricky to get that right. We would need to know what values were 
specfied and then diffing what is not there and then validating that such 
combination makes sense (and if it does not, should we fail otherwise valid CQL 
statement just because we happened to merge values from cassandra.yaml and that 
combination was not right? I do not think so).

Let's just go with a simple case of "if compression is not specified just take 
the defaults from cassandra.yaml" rather then trying to merge the configs ... 
Too much of a hassle, might come as an improvement if somebody is really after 
that.

I will try to come up with more tests and I think that sometimes next week this 
should be all completed and ready for review again.

[CASSANDRA-12937-squashed|https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937-squashed]
{noformat}
java17_pre-commit_tests 
  ✓ j17_build 4m 7s
  ✓ j17_cqlsh_dtests_py311   7m 11s
  ✓ j17_cqlsh_dtests_py311_vnode 7m 18s
  ✓ j17_cqlsh_dtests_py386m 58s
  ✓ j1

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837237#comment-17837237
 ] 

Alex Petrov edited comment on CASSANDRA-12937 at 4/15/24 1:08 PM:
--

bq. Yes, I think this is the most ideal solution. If somebody wants to 
experiment with a new compressor and similar, there would need to be some knob 
to override it, like some JMX method or similar, and all risks attached to that 
(divergence of the configuration caused by operator's negligence) would be on 
him.

Some things are actually quite useful for gradual rollout. For example, 
compression. You probably do not want to rewrite your sstables across the 
entire cluster. Similar arguments may be made for canary deployments of 
memtable settings and other things. 

I agree that it is fine if these parameters are completely transient (i.e. if 
you have set it to something that diverges from the clusterwide value, it will 
get reverted back after the node bounce). In such case, probably they will not 
go through TCM and will be purely node-local.

Examples of things that are now configuable via yaml but will be configurable 
via TCM if we go ahead with this proposal: partitioner, memtable configuration, 
default compaction strategy, compression. As Sam has mentioned, "which specific 
value makes it into schema just depends on which instance acts as the 
coordinator for a given DCL statement".

bq. but I remain unconvinced that just picking the defaults from whatever node 
happens to be coordinating is the right way to go.

I have talked with Sam shortly just to make sure I understand it correctly 
before trying to describe it. Since this was first worded in a way that 
suggested a problem but not directly proposed a solution (possibly described 
elsewhere), I will attempt to do this. Sam has already described a part of the 
solution as:

bq. That should probably be in a parallel local datastructure though, not in 
the node's local log table as we don't want to ship those local defaults to 
peers when providing log catchup (because they should use their own defaults).

The part that was missing for me was where would the values be coming from, and 
what would be the precedence. When executing a {{CREATE}} statement on some 
node _without_ specifying, say, compression, the statement will be created and 
executed without the value for compression set at all. Every node will pick the 
value from its ephemeral parallel structure Sam described (which is also 
settable via JMX and alike like Stefan mentioned). If no value is present in 
this table, it will be picked from yaml (alternatively, we could just populate 
this structure from yaml, too, but I consider these things roughly equivalent).


was (Author: ifesdjeen):
bq. Yes, I think this is the most ideal solution. If somebody wants to 
experiment with a new compressor and similar, there would need to be some knob 
to override it, like some JMX method or similar, and all risks attached to that 
(divergence of the configuration caused by operator's negligence) would be on 
him.

Some things are actually quite useful for gradual rollout. For example, 
compression. You probably do not want to rewrite your sstables across the 
entire cluster. Similar arguments may be made for canary deployments of 
memtable settings and other things. 

I agree that it is fine if these parameters are completely transient (i.e. if 
you have set it to something that diverges from the clusterwide value, it will 
get reverted back after the node bounce). In such case, probably they will not 
go through TCM and will be purely node-local.

Examples of things that are now configuable via yaml but will be configurable 
via TCM if we go ahead with this proposal: partitioner, memtable configuration, 
default compaction strategy, compression. As Sam has mentioned, "which specific 
value makes it into schema just depends on which instance acts as the 
coordinator for a given DCL statement".

bq. but I remain unconvinced that just picking the defaults from whatever node 
happens to be coordinating is the right way to go.

I have talked with Sam shortly just to make sure I understand it correctly 
before trying to describe it. Since this was first worded in a way that 
suggested a problem but not directly proposed a solution (possibly described 
elsewhere), I will attempt to do this. Sam has already described a part of the 
solution as:

bq. That should probably be in a parallel local datastructure though, not in 
the node's local log table as we don't want to ship those local defaults to 
peers when providing log catchup (because they should use their own defaults).

The part that was missing for me was where would the values be coming from, and 
what would be the precedence. When executing a {CREATE} statement on some node 
_without_ specifying, say, compression, the statement wil

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837210#comment-17837210
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/15/24 12:06 PM:
-

{quote}It seems like if we actually want these to be cluster wide values and 
not configurable on a per-node basis the defaults themselves should be in 
TCM{quote}

Yes, I think this is the most ideal solution. If somebody wants to experiment 
with a new compressor and similar, there would need to be some knob to override 
it, like some JMX method or similar, and all risks attached to that (divergence 
of the configuration caused by operator's negligence) would be on him. 

However, who would be changing the defaults? What I mean by that is that if 
defaults are committed in TCM, then if we change our mind about the defaults, 
by what mean would we commit them into TCM again, now changed? 


was (Author: smiklosovic):
{quote}It seems like if we actually want these to be cluster wide values and 
not configurable on a per-node basis the defaults themselves should be in 
TCM{quote}

Yes, I think this is the most ideal solution. If somebody wants to experiment 
with a new compressor and similar, there would need to be some knob to override 
it, like some JMX method or similar, and all risks attached to that (divergence 
of the configuration caused by operator's negligence) would be on him. 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837179#comment-17837179
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-12937 at 4/15/24 11:11 AM:
---

The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node so that 
it can be re-used if/when the transformation is reapplied. That should probably 
be in a parallel local datastructure though, not in the node's local log table 
as we don't want to ship those local defaults to peers when providing log 
catchup (because they should use their own defaults).  


was (Author: beobal):
The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node to be 
persisted locally so that it can be re-used if/when the transformation is 
reapplied. That should probably be in a parallel local datastructure though, 
not in the node's local log table as we don't want to ship those local defaults 
to peers when providing log catchup (because they should use their own 
defaults).  

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837176#comment-17837176
 ] 

Jacek Lewandowski edited comment on CASSANDRA-12937 at 4/15/24 11:01 AM:
-

The problem with the failing test is probably that the default configuration 
for compression parameters (and other defaults for table / keyspace 
creation/alteration) should be part of the schema transformation data and 
stored in TCM log.

This is it not an issues related to this ticket because it applies to various 
settings; For example, even without this PR, similar test would fail while 
manipulating a value of "
cassandra.sstable_compression_default" property. Then, we would have the same 
problem with default compaction and memtable options which are also got from 
the configuration, 


was (Author: jlewandowski):
The problem with the failing test is probably that the default configuration 
for compression parameters (and other defaults for table / keyspace 
creation/alteration) should be part of the schema transformation data and 
stored in TCM log.

 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-05 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831674#comment-17831674
 ] 

Michael Semb Wever edited comment on CASSANDRA-12937 at 4/5/24 11:52 AM:
-

-+1 to [https://github.com/apache/cassandra/pull/3168]-

EDIT: legit concerns raised below.


was (Author: michaelsembwever):
-+1 to https://github.com/apache/cassandra/pull/3168 -


EDIT: legit concerns raised below.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-05 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831674#comment-17831674
 ] 

Michael Semb Wever edited comment on CASSANDRA-12937 at 4/5/24 11:52 AM:
-

-+1 to https://github.com/apache/cassandra/pull/3168 -


EDIT: legit concerns raised below.


was (Author: michaelsembwever):
+1 to https://github.com/apache/cassandra/pull/3168 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:15 PM:
---

I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we started to have system tables with 
different compressors? For now it is just same for every table. Right now, for 
system tables, we check if set compressor is equal to the default one. If we 
have "more defaults", then we would need to take care of this so default 
compressor(s) would not be overwritten. However, that means that we might 
probably change compressor for system tables as long as it is found among the 
default ones. We would need to probably keep track of what is the default 
compressor for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out loud about all 
the possible consequences in the future.


was (Author: smiklosovic):
I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we started to have system tables with 
different compressors? For now it is just same for every table. Right now, for 
system tables, we check if set compressor is equal to the default one. If we 
have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out loud about all 
the possible consequences in the future.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:14 PM:
---

I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we started to have system tables with 
different compressors? For now it is just same for every table. Right now, for 
system tables, we check if set compressor is equal to the default one. If we 
have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out loud about all 
the possible consequences in the future.


was (Author: smiklosovic):
I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we started to have a system tables 
with different compressors? For now it is just same for every table. Right now, 
for system tables, we check if set compressor is equal to the default one. If 
we have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out loud about all 
the possible consequences in the future.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression par

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:14 PM:
---

I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we started to have a system tables 
with different compressors? For now it is just same for every table. Right now, 
for system tables, we check if set compressor is equal to the default one. If 
we have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out loud about all 
the possible consequences in the future.


was (Author: smiklosovic):
I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we would start to have a system tables 
with different compressors? For now it is just same for every table. Right now, 
for system tables, we check if set compressor is equal to the default one. If 
we have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out aloud about all 
the possible consequences in the future.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compress

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:13 PM:
---

I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

However, just thinking out loud, what if we would start to have a system tables 
with different compressors? For now it is just same for every table. Right now, 
for system tables, we check if set compressor is equal to the default one. If 
we have "more defaults", then we would need to take care of this so default 
compressor(s) would not be a set. However, that means that we might probably 
change compressor for system tables as long as it is found among the default 
ones. We would need to probably keep track of what is the default compressor 
for a specific system table and error out if it is different. 

I do not think this is on the table for now, just thinking out aloud about all 
the possible consequences in the future.


was (Author: smiklosovic):
I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834004#comment-17834004
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/24 4:09 PM:
---

I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed table - and system_auth 
is distributed, but it is system one. So changing the compressor on that failed 
because we are preventing that on code level while constructing TableMetadata, 
not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.


was (Author: smiklosovic):
I want to write this down explicitly ... It is not possible to alter system 
schema already, on CQL level. So preventing a user to change a compressor for 
system tables, even replicated ones, to some which is not the default is not 
possible already. What we want to prevent is to not be even able configure it 
via cassandra.yaml to something else. These are two different problems.

I write this because as I go through the patch and polishing it, it broke a 
test which is changing compressors for each distributed tables - and 
system_auth is distributed, but it is system one. So changing the compressor on 
that failed because we are preventing that on code level while constructing 
TableMetadata, not just when we execute a respective CQL command. 

This might be a little bit surprising to see at first but I think it makes 
sense. We just prevent that from happening one "level" deeper, not only CQL but 
it is not possible to configure it programmatically too.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-07-25 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747123#comment-17747123
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/26/23 6:24 AM:


I think we need to park this ticket for a while and return to it after 5.0 is 
out. There is clearly more work on this to be done. 

I was thinking about how to make it more robust, all in my head yet, hard to 
communicate, but we should spend little bit more time on this (how to exclude 
system keyspaces and all the related code and logic around that).

Claude also came with an idea to make it able to specify compression _per 
keyspace_ and that would shuffle quite a lot of stuff - configuration wise.

Also, if we say what should have the default compression ... why _just 
compression?_ Why we could not support e.g. compaction as well? One might say 
that he wants to compact all tables in this keyspace by 
UnifiedCompactionStrategy and tables in another keyspace by TWCS, for example.

So we see that this might be done a little bit more robustly to not just wire 
it for compression but to allow the code to be "plastic" enough to make a room 
for similar improvements in the future. 

 


was (Author: smiklosovic):
I think we need to park this ticket for a while and return to it after 5.0 is 
out. There is clearly more work on this to be done. 

I was thinking about how to make it more robust, all in my head yet, hard to 
communicate, but we should spend little bit more time on this (how to exclude 
system keyspaces and all the related code and logic around that).

Claude also came with an idea to make it able to specify compression _per 
keyspace_ and that would shuffle quite a lot of stuff - configuration wise.

Also, if we say what should be the default compression ... why _just 
compression?_ Why we could not support e.g. compaction as well? One might say 
that he wants to compact all tables in this keyspace by 
UnifiedCompactionStrategy and tables in another keyspace by TWCS, for example.

So we see that this might be done a little bit more robustly to not just wire 
it for compression but to allow the code to be "plastic" enough to make a room 
for similar improvements in the future. 

 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:30 AM:
-

I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the compressor is fast - so we are going to flush *system* tables with a 
*custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.

If we are not comfortable with that, we can check in DataComponent where 
compressionParams are resolved to always flush with LZ4 when flushing system 
tables, no matter what sstable_compressor is set to. I think that that is safer 
to do. LZ4 is battle-tested. I am not completely sure that a user knows that if 
he provides his fast compressor it is going to be used for flushing the system 
tables as well. That is rather a delicate matter ...


was (Author: smiklosovic):
I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the compressor is fast - so we are going to flush *system* tables with a 
*custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks a

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:17 AM:
-

I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the compressor is fast - so we are going to flush *system* tables with a 
*custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.


was (Author: smiklosovic):
I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the default compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the default compressor is fast - so we are going to flush *system* tables 
with a *custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-07-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744151#comment-17744151
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 7/18/23 10:16 AM:
-

I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a custom compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the default compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the default compressor is fast - so we are going to flush *system* tables 
with a *custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.


was (Author: smiklosovic):
I want to highlight one not-so-obvious consequence of introducing this patch.

There is flush_compression property in cassandra.yaml doing this:

{code}
# Compression to apply to SSTables as they flush for compressed tables.
# Note that tables without compression enabled do not respect this flag.
#
# As high ratio compressors like LZ4HC, Zstd, and Deflate can potentially
# block flushes for too long, the default is to flush with a known fast
# compressor in those cases. Options are:
#
# none : Flush without compressing blocks but while still doing checksums.
# fast : Flush with a fast compressor. If the table is already using a
#fast compressor that compressor is used.
# table: Always flush with the same compressor that the table uses. This
#was the pre 4.0 behavior.
#
# flush_compression: fast
{code}

By default, "fast" compressor is LZ4, it will use that one.

So, by default, "fast" compressor is used for flushing (LZ4). Hence, if one 
does "nodetool snapshot", it will do snapshots of all tables, user defined as 
well as system tables and it is compressed with LZ4.

If one specifies a default compressor to be used in newly introduced 
"sstable_compression" field, two situations might happen:

1) the default compressor is not "fast" (check ICompressor.Uses and 
ICompressor.recommendedUses()), that means that we need to fallback to a fast 
compressor - it will default to LZ4
2) the default compressor is fast - so we are going to flush *system* tables 
with a *custom, user-specified, ICompressor*.

I want to be sure that we are on the same page. While it makes sense to do it 
like that - a user specified a custom compressor to use - that also means that 
system tables will be compressed with that compressor as well. I want to be 
sure everybody is OK with this.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Addit

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-06-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729083#comment-17729083
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 6/4/23 5:18 PM:
---

[~claude] we got the second review from Mick here (1).

Feel free to take my branch and base further work on top of that. I think we 
should consolidate all the branches and continue to work just on one and 
discard the rest. 

(1) https://github.com/apache/cassandra/pull/2282#issuecomment-1575564266


was (Author: smiklosovic):
[~claude] we got a second review from Mick here (1).

Feel free to take my branch and base further work on top of that. I think we 
should consolidate all the branches and continue to work just on one and 
discard the rest. 

(1) https://github.com/apache/cassandra/pull/2282#issuecomment-1575564266

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-06-01 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728384#comment-17728384
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 6/1/23 8:47 PM:
---

I finally fixed it all. Phew, what a ride! I reorganized the branch in such a 
way that there is one commit which implements it and another one where tests 
are done so it is nicely visible and easier to contemplate about.

I think we are finally in a good shape to have a look of the second committer. 

[~mck] would you take a look, please?

[~claude] no worries you will be still author of the PR, I do that upon actual 
commit once we get there, I just squashed / reorganized the commits little bit 
for now.

PR: https://github.com/apache/cassandra/pull/2282
j8 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2353/workflows/aa9ee549-ec99-455f-8e89-b325e013ff8e
j11 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2354/workflows/37b9da05-22ce-41ef-8a1b-753e3301c6eb




was (Author: smiklosovic):
I finally fixed it all. Phew, what a ride! I reorganized the branch in such a 
way that there is one commit which implements it and another one where tests 
are done so it is nicely visible and easier to contemplate about.

I think we are finally in a good shape to have a look of the second committer. 

[~mck] would you take a look, please?

[~claude] no worries you will be still author of the PR, I do that upon actual 
commit once we get there, I just squashed / reorganized the commits little bit 
for now.

PR: https://github.com/apache/cassandra/pull/2282
j8 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2353/workflows/aa9ee549-ec99-455f-8e89-b325e013ff8e



> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-05-29 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727181#comment-17727181
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/29/23 7:03 PM:


this is j11 build 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1
I would focus on the failed units first: 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1/jobs/42564/tests
https://github.com/instaclustr/cassandra/tree/CASSANDRA-12937


was (Author: smiklosovic):
this is j11 build 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1
I would focus on the failed units first: 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/2329/workflows/03d396b7-6356-45da-87f6-a3540f69ecd1/jobs/42564/tests


> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-05-22 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725115#comment-17725115
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/22/23 8:28 PM:


There are few hundreds of errors in tests 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2471/#showFailuresLink

It all seems to fail on "Unknown compression options: ([sstable_compression])"

We stopped to use this as it was deprecated so we removed it but tests are 
still using it.


was (Author: smiklosovic):
There are few hundreds of errors in tests 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/2471/#showFailuresLink

It all seem to fail on "Unknown compression options: ([sstable_compression])"

We stopped to use this as it was deprecated so we removed it but tests are 
still using it.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-05-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719227#comment-17719227
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 5/4/23 9:23 AM:
---

I did another round of review:

When I do this
{code:java}
commitlog_compression:
  - class_name: lz4
parameters:
  - enable: "true"
lz4_compressor_type: "fast" {code}
This does not work: 
{code:java}
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Could not 
create Compression for type org.apache.cassandra.io.compress.lz4
    at 
org.apache.cassandra.schema.CompressionParams.parseCompressorClass(CompressionParams.java:338)
    at 
org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:396)
    at 
org.apache.cassandra.db.commitlog.CommitLog$Configuration.(CommitLog.java:630)
    at org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:107)
    at org.apache.cassandra.db.commitlog.CommitLog.construct(CommitLog.java:92)
    at org.apache.cassandra.db.commitlog.CommitLog.(CommitLog.java:77) 
{code}
I would expect similar errors to be found when trying aliases for other 
_compression configuration properties.

Next, the default configuration for sstable_compression which is currently this 
does not work:
{code:java}
sstable_compression:
#   - class_name: lz4
# parameters:
#   - enable: true
# chunk_length: 16KiB
# min_compress_ratio: 0.0
# max_comrpessed_length: 16KiB
# class_specific_parameter: value {code}
When I uncomment it like this:
{code:java}
sstable_compression:
   - class_name: lz4
 parameters:
   - enable: true
 chunk_length: 16KiB
 min_compress_ratio: 0.0
 max_comrpessed_length: 16KiB
 class_specific_parameter: value {code}
First of all it says that:
{code:java}
Caused by: java.lang.ClassCastException: java.lang.Boolean cannot be cast to 
java.lang.CharSequence
    at 
org.apache.cassandra.schema.CompressionParams.copyOptions(CompressionParams.java:406)
    at 
org.apache.cassandra.schema.CompressionParams.fromParameterizedClass(CompressionParams.java:160)
    at 
org.apache.cassandra.schema.CompressionParams.defaultParams(CompressionParams.java:150)
    at 
org.apache.cassandra.schema.TableParams$Builder.(TableParams.java:353)
    at org.apache.cassandra.schema.TableParams.builder(TableParams.java:119) 
{code}
This means that it expects values to be strings, so I do it like this:
{code:java}
sstable_compression:
   - class_name: lz4
 parameters:
   - enable: "true"
 chunk_length: "16KiB"
 min_compress_ratio: "0.0"
 max_comrpessed_length: "16KiB"
 class_specific_parameter: "value" {code}
But it says:
{code:java}
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unknown 
compression options enable
    at 
org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:358)
    at 
org.apache.cassandra.schema.CompressionParams.(CompressionParams.java:260)
    at 
org.apache.cassandra.io.compress.CompressionMetadata.open(CompressionMetadata.java:94)
    ... 15 common frames omitted {code}
I think this has to be changed to "enabled" as it is "enabled" in trunk too.

Next, it does not seem to me that the value of "lz4" is picked up because if I 
do this:
{code:java}
sstable_compression:
   - class_name: lz4
 parameters:
   - enabled: "true"
 chunk_length: "16KiB"
 min_compress_ratio: "0.0"
 max_comrpessed_length: "16KiB"
 class_specific_parameter: "value"
{code}
it does not say that the "class_specific_parameter" is invalid, check this: 
when I do
{code:java}
sstable_compression:
   - class_name: LZ4Compressor
 parameters:
   - enabled: "true"
 chunk_length: "16KiB"
 min_compress_ratio: "0.0"
 max_comrpessed_length: "16KiB"
 class_specific_parameter: "value" {code}
it throws:
{code:java}
Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unknown 
compression options class_specific_parameter
    at 
org.apache.cassandra.schema.CompressionParams.createCompressor(CompressionParams.java:358)
    at 
org.apache.cassandra.schema.CompressionParams.lambda$fromClassAndOptions$0(CompressionParams.java:228)
    at 
org.apache.cassandra.schema.CompressionParams.fromClassAndOptions(CompressionParams.java:229)
    at 
org.apache.cassandra.schema.CompressionParams.fromParameterizedClass(CompressionParams.java:160)
    at 
org.apache.cassandra.schema.CompressionParams.defaultParams(CompressionParams.java:150)
    at 
org.apache.cassandra.schema.TableParams$Builder.(TableParams.java:353)
    at org.apache.cassandra.schema.TableParams.builder(TableParams.java:119)
    at 
org.apache.cassandra.cql3.statements.schema.TableAttributes.validate(TableAttributes.java:60)
    at 
org.apac

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-21 Thread Claude Warren (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714856#comment-17714856
 ] 

Claude Warren edited comment on CASSANDRA-12937 at 4/21/23 1:57 PM:


[~smiklosovic], [~mck] 

Please review  [Github Pull Request 
#2254|https://github.com/apache/cassandra/pull/2254] there are a number of 
changes.
 * switched to ParameterizedClass for yaml configuration
 * ensured processing to support CQL compression parameters
 * added extensive yaml file documentation.
 * moved methods used only in testing from CompressionParams to a new 
TestingCompressionParamsFactory class.
 * updated tests to use TestingCompressionFactory.
 * added extensive testing to ensure that all combinations of Map and 
ParameterizedClass construct the same CompressionParams.
 * added additional testing coverage for pre-existing methods
 * unified error reporting so that the same error on different paths reports 
with the same text.


was (Author: claudenw):
[~smiklosovic], [~mck] 

Please review  [Github Pull Request 
#2254|https://github.com/apache/cassandra/pull/2254] there are a number of 
changes.
 * switched to ParameterizedClass for yaml configuration
 * ensured processing to support CQL compression parameters
 * added extensive yaml file documentation.
 * moved methods used only in testing from CompressionParams to a new 
TestingCompressionParamsFactory class.
 * updated tests to use TestingCompressionFactory.
 * added extensive testing to ensure that all combinations of Map and 
ParameterixedClass construct the same CompressionParams.
 * added additional testing coverage or pre-existing methods
 * unified error reporting so that the same error on different paths reports 
with the same text.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:41 PM:


https://github.com/apache/cassandra/pull/2282

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "64KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "1MiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2282

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}
{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "64KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:39 PM:


https://github.com/apache/cassandra/pull/2282

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}
{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "64KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2282

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:36 PM:


https://github.com/apache/cassandra/pull/2282

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:22 PM:


https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
parameters:
- chunk_length_in_kb: "32"
  min_compress_ratio: "0"
{code}

{code}
sstable_compression:
  - class_name: "lz4"
parameters:
- chunk_length: "32KiB"
  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:20 PM:


https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.
5. aliases supported

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:19 PM:


https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. I am not sure I covered all parameters but I expect that this 
might be done in a similar fashion.


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713597#comment-17713597
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 2:17 PM:


https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported (as well as old)
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. 


was (Author: smiklosovic):
https://github.com/apache/cassandra/pull/2281

1. flat map
2. ParameterizedClass - same stuff as everywhere
3. new format of values supported
4. some parameters / their names were deprecated in 3.0 so they can be removed 
in 5.0.

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length: "32KiB"
#  min_compress_ratio: "0"
{code}

{code}
#sstable_compression:
#  - class_name: "org.apache.cassandra.io.compress.LZ4Compressor"
#parameters:
#- chunk_length_in_kb: "32"
#  min_compress_ratio: "0"
{code}

All this works. 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713571#comment-17713571
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 12:25 PM:
-

I again do not have any clear answer why we are "extracting" something and then 
we have extra parameterized class with further options. 

_Currently CompressionParams takes the class name and the parameters.  It 
extracts chunk_length_kb (or chunk_length_in_kb) and min_compress_ratio from 
the parameters and uses them to build the CompressionParams instance._

Do not you just try to copy the same approach as in "CompressionParams extracts 
these parameters so we need to apply same extraction in cassandra.yaml"? Why is 
the extraction like it is suggested important? Why we can not flatten the 
configuration? It is quite questionable why we are having nested sections like 
that when, from ux perspective, it is truly just a map. The fact that we are 
doing something internally in some fashion does not mean that it has to 
manifest into the configuration in cassandra.yaml. 

If we want to support same values as in other places in cassandra.yaml but have 
nice flat map (preferably), would not it be possible to translate these values 
internally into the old values? The most idealistic option would be to start to 
support the same format of values for parameters in cassandra.yaml in CQL as 
well.

I do not think that without further discussion how this should be modeled and 
reaching broader consensus this ticket is close to the actual merging.


was (Author: smiklosovic):
I again do not have any clear answer why we are "extracting" something and then 
we have extra parameterized class with further options. 

_Currently CompressionParams takes the class name and the parameters.  It 
extracts chunk_length_kb (or chunk_length_in_kb) and min_compress_ratio from 
the parameters and uses them to build the CompressionParams instance._

Do not you just try to copy the same approach as in "CompressionParams extracts 
these parameters so we need to apply same extraction in cassandra.yaml"? Why is 
the extraction like it is suggested important? Why we can not flatten the 
configuration? It is quite questionable why we are having nested sections like 
that when, from ux perspective, it is truly just a map. The fact that we are 
doing something internally in some fashion does not mean that it has to 
manifest into the configuration in cassandra.yaml. 

If we want to support same values as in other places in cassandra.yaml but have 
nice flat map (preferably), would not it be possible to translate these values 
internally into the old values? Or even better, would it be possible to have 
new value parameters in cassandra.yaml but they would be transformed internally 
into the old values? The most idealistic option would be to start to support 
the same format of values for parameters in cassandra.yaml in CQL as well.

I do not think that without further discussion how this should be modeled and 
reaching broader consensus this ticket is close to the actual merging.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the ne

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713551#comment-17713551
 ] 

Michael Semb Wever edited comment on CASSANDRA-12937 at 4/18/23 11:23 AM:
--

I would rather see consistency in the cassandra.yaml
Please use parameter names and unit as is the new style in the yaml. It does 
not need to match the cql style.

If it is needed to have an extra class to support this, it is not a headache or 
an obsession. I don't have any opinion about whether the extra top-level 
options (chunk_length, maxCompressedLength, minCompressRatio) should be above 
the parameter map or in it.  I do see the issue with having to transform names 
and values inside the map as being clumsy and potentially error prone.

Neither hints_compression or commitlog_compression supports customising the 
chunk_length AFAIK.


was (Author: michaelsembwever):
I would rather see consistency in the cassandra.yaml
Please use parameter names and unit as is the new style in the yaml. It does 
not need to match the cql style.

If it is needed to have an extra class to support this, it is not a headache or 
an obsession. I also don't have any opinion about whether the extra top-level 
options (chunk_length, maxCompressedLength, minCompressRatio) should be above 
the parameter map or in it.  I do see the issue with having to transform names 
and values inside the map as being clumsy and potentially error prone.

Neither hints_compression or commitlog_compression supports customising the 
chunk_length AFAIK.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:42 AM:
-

All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. Why do we want to change all of this to 
further confuse the user?

EDIT: to further support my case with having same parameters and their units in 
cassandra.yaml as they are specified in CQL upon table creation, what happens 
in practice is that people who want to take advantage of this configuration 
would just copy-paste CQL snippet for compression params and they would make it 
like entries in the map by hitting "enter" on the keyboard and they are done. I 
highly doubt that they would like to specify "other units" just for the sake of 
consistency with the rest of cassandra.yaml. I do not think they care at all. 
They just want to copy it over from CQL and call it the day.


was (Author: smiklosovic):
All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. Why do we want to change all of this to 
further confuse the user?

> Default setting (yaml) for SSTable compression
> 

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:22 AM:
-

All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. Why do we want to change all of this to 
further confuse the user?


was (Author: smiklosovic):
All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. We do we want to change all of this to further 
confuse the user?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and 

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:21 AM:
-

All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which exists nowhere else with extracting some parameters 
outside? Why we can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. We do we want to change all of this to further 
confuse the user?


was (Author: smiklosovic):
All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which nowhere else with exacting some parameters outside? Why we 
can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. We do we want to change all of this to further 
confuse the user?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> Th

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713529#comment-17713529
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/18/23 10:20 AM:
-

All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams. 
(or following same units).

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which nowhere else with exacting some parameters outside? Why we 
can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. We do we want to change all of this to further 
confuse the user?


was (Author: smiklosovic):
All I prefer to see is to have a simple map of parameters into 
ParametrizedClass which would have exactly same names as for their CQL 
counterparts. They would be literally just used there. There does not seem to 
be any collisions with that. I do not get the "obsession" with having 
parameters for these compressors to follow the same names of CompressionParams.

_The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass._ 

why do they have to be extracted in the first place? 

for hints_compression in yaml we have:

{code}
# Compression to apply to the hint files. If omitted, hints files
# will be written uncompressed. LZ4, Snappy, and Deflate compressors
# are supported.
#hints_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

For commitlog_compression we have:

{code}
# Compression to apply to the commit log. If omitted, the commit log
# will be written uncompressed.  LZ4, Snappy, and Deflate compressors
# are supported.
# commitlog_compression:
#   - class_name: LZ4Compressor
# parameters:
# -
{code}

for sstable_compression, I would prefer to see the exact same way of the 
configuration. Why are we trying to introduce completely custom way of the 
configuration which nowhere else with exacting some parameters outside? Why we 
can not use same stuff?

I do not think that we should blindly follow "the parameters names and their 
units". I think we already discussed this. I already explained all advantages 
of following what we have there already. If we make it explicitly clear that 
these parameters are exactly same as if they would be put into compression 
params upon table creation, they would save us a lot of headache to have 
something completely custom and people would need to put there parameters and 
their names as they are used to. We do we want to change all of this to further 
confuse the user?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml 

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-18 Thread Claude Warren (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713524#comment-17713524
 ] 

Claude Warren edited comment on CASSANDRA-12937 at 4/18/23 10:09 AM:
-

hints_compression and commitlog_compression use the standard ParameterizedClass.

The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass.  The parameters in CompressionParams are 
{code:java}
private final int chunkLength;
private final int maxCompressedLength;  // In content we store max length to 
avoid rounding errors causing compress/decompress mismatch.
private final double minCompressRatio;  // In configuration we store min ratio, 
the input parameter.
{code}
The ParameterizedClass constructor that accepts the Map of 
options expects a key of "chunk_length_in_kb" or "chunk_length_kb"  as well as 
a "min_compress_ratio".

This change I made does not change the hints_compression or 
commitlog_compression options.

The yaml file has an additional set of requirements:
 * The chunkLength (yaml: chunk_length) should be specified with the 
DataStorageSpec suffix (e.g. KiB).
 * The maxCompressedLength should be accepted as a parameter.
 * The maxCompressedLength  (yaml: max_compressed_length)  should be specified 
with the DataStorageSpec suffix (e.g. KiB).
 * maxCompressedLength and minCompressRatio are related to each other via 
chunk_length; so only one can be specified.

I could work chunkLength and maxCompressedLength  into the class_name 
parameters, however, I believe this will result in adding 2 more reserved words 
 both of which will need to be removed from the parameter list.  This change 
will affect all CompressionParams  constructions that use the 
Map format.  

I will make the change with the following processes for determining collision 
values:
 * If both max_compressed_length and min_compress_ratio are specified an 
ConfigurationException will be thrown.
 * if both chunk_length and either chunk_length_in_kb or chunk_length_kb  are 
specified and they are not equal  ConfiguraitonException will be thrown.
 * if chunk_length or max_compressed_length are specified and do not use the 
DataStorageSpec suffix a ConfigurationException will be thrown

I will also ensure that the short names: lz4, none, noop, snappy, deflate, and 
zstd  will work as class names and use the defaults specified by the 
CompressionParams methods of the same names.


was (Author: claudenw):
hints_compression and commitlog_compression use the standard ParameterizedClass.

The CompressionParams has 3 parameters that it extracts or creates from the 
parameters in the ParameterizedClass.  The parameters in CompressionParams are 
{code:java}
private final int chunkLength;
private final int maxCompressedLength;  // In content we store max length to 
avoid rounding errors causing compress/decompress mismatch.
private final double minCompressRatio;  // In configuration we store min ratio, 
the input parameter.
{code}
The ParameterizedClass constructor that accepts the Map of 
options expects a key of "chunk_length_in_kb" or "chunk_length_kb"  as well as 
a "min_compress_ratio".

This change I made does not change the hints_compression or 
commitlog_compression options.

The yaml file has an additional set of requirements:
 * The chunkLength (yaml: chunk_length) should be specified with the 
DataStorageSpec suffix (e.g. KiB).
 * The maxCompressedLength should be accepted as a parameter.
 * The maxCompressedLength  (yaml: max_compressed_length)  should be specified 
with the DataStorageSpec extensions (e.g. KiB).
 * maxCompressedLength and minCompressRatio are related to each other via 
chunk_length; so only one can be specified.

I could work chunkLength and maxCompressedLength  into the class_name 
parameters, however, I believe this will result in adding 2 more reserved words 
 both of which will need to be removed from the parameter list.  This change 
will affect all CompressionParams  constructions that use the 
Map format.  

I will make the change with the following processes for determining collision 
values:


 * If both max_compressed_length and min_compress_ratio are specified an 
ConfigurationException will be thrown.
 * if both chunk_length and either chunk_length_in_kb or chunk_length_kb  are 
specified and they are not equal  ConfiguraitonException will be thrown.
 * if chunk_length or max_compressed_length are specified and do not use the 
DataStorageSpec suffix a ConfigurationException will be thrown

I will also ensure that the short names: lz4, none, noop, snappy, deflate, and 
zstd  will work as class names and use the defaults specified by the 
CompressionParams methods of the same names.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> 

[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713056#comment-17713056
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 12:24 PM:
-

Why do you insist on this:

{code}
sstable_compressor:
  chunk_length: 16KiB
  min_compress_ratio: 0.0
  class_name: org.apache.cassandra.io.compress.LZ4Compressor
  parameters:
  - param1 : value
{code}

Instead of doing like this:

{code}
sstable_compression:
- class_name: org.apache.cassandra.io.compress.LZ4Compressor
  parameters:
 - param1: "value1"
   whateverParams .
{code}


was (Author: smiklosovic):
Why do you insist on this:

{code}
sstable_compressor:
  chunk_length: 16KiB
  min_compress_ratio: 0.0
  class_name: org.apache.cassandra.io.compress.LZ4Compressor
  parameters:
   - param1 : value
{code}

Instead of doing like this:

{code}
sstable_compression:
- class_name: org.apache.cassandra.io.compress.LZ4Compressor
  parameters:
 - param1: "value1"
   whateverParams .
{code}

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713054#comment-17713054
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 12:15 PM:
-

I do not understand. There might be no collisions whatsoever if we make 
sstable_compression in Config of type ParametrizedClass as mentioned above. 


was (Author: smiklosovic):
I do not understand. There might be no collisions whatsoever if we make 
sstable_comression in Config of type ParametrizedClass as mentioned above. 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-17 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713027#comment-17713027
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/17/23 11:09 AM:
-

Are we talking about this? https://github.com/apache/cassandra/pull/2254/files

There is still SSTableCompressionOptions in Config. 

Am I reviewing the correct branch?

This one seems to have SSTableCompressionOptions in Config too.

https://github.com/apache/cassandra/pull/2199


was (Author: smiklosovic):
Are we talking about this? https://github.com/apache/cassandra/pull/2254/files

There is still SSTableCompressionOptions in Config. 

Am I reviewing the correct branch?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-11 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710831#comment-17710831
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/11/23 9:01 AM:


Great [~claude], I ll try to finish that soonish and it would be great if you 
participated in the review. Please tell me if you want to do that other way 
around.


was (Author: smiklosovic):
Great [~claude], I ll try to finish that soonish and it would be great if you 
participated in the review.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709261#comment-17709261
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/6/23 7:45 AM:
---

I think being consistent with CQL as well as be prepared for the future so we 
can specify any compressor without changing anything is more important. 

To summarise:
1) consistency with CQL
2) zero learning curve, a user just puts there what he is used to
3) any future in-built compressor supported out of the box so we do not need to 
think about it and we do not need to change other enums, switches etc to 
support that
4) custom compressor supported as well, we do not do any differences between 
"in-built and custom" in yaml. It is transparent.
5) any parameters possible to add
6) uses same code path for getting CompressorParams, no new classes and 
boiler-plate code necessary
7) One way of specifying in-built compressor as well as the custom one, we do 
not need to do any difference between them
8) We can configure every single parameter of a compressor, not only that 
helper creation functions offer us

I really think that all these points in total beat the argument that we need to 
have parameters in "so and so format". If we are transparent about the fact 
that what is used in CQL is accepted in sstable_compressor map, it is really a 
no-brainer.

[~mck] what do you think?


was (Author: smiklosovic):
I think being consistent with CQL as well as be prepared for the future so we 
can specify any compressor without changing anything is more important. 

To summarise:
1) consistency with CQL
2) zero learning curve, a user just puts there what he is used to
3) any future in-built compressor supported out of the box so we do not need to 
think about it and we do not need to change other enums, switches etc to 
support that
4) custom compressor supported as well, we do not do any differences between 
"in-built and custom" in yaml. It is transparent.
5) any parameters possible to add
6) uses same code path for getting CompressorParams, no new classes and 
boiler-plate code necessary
7) One way of specifying in-built compressor as well as the custom one, we do 
not need to do any difference between them
8) We can configure every single parameter of a compressor, not only that 
helper creation functions offer us

I really think that all these points in total beats the argument that we need 
to have parameters in "so and so format". If we are transparent about the fact 
that what is used in CQL is accepted in sstable_compressor map, it is really a 
no-brainer.

[~mck] what do you think?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-06 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709261#comment-17709261
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/6/23 7:44 AM:
---

I think being consistent with CQL as well as be prepared for the future so we 
can specify any compressor without changing anything is more important. 

To summarise:
1) consistency with CQL
2) zero learning curve, a user just puts there what he is used to
3) any future in-built compressor supported out of the box so we do not need to 
think about it and we do not need to change other enums, switches etc to 
support that
4) custom compressor supported as well, we do not do any differences between 
"in-built and custom" in yaml. It is transparent.
5) any parameters possible to add
6) uses same code path for getting CompressorParams, no new classes and 
boiler-plate code necessary
7) One way of specifying in-built compressor as well as the custom one, we do 
not need to do any difference between them
8) We can configure every single parameter of a compressor, not only that 
helper creation functions offer us

I really think that all these points in total beats the argument that we need 
to have parameters in "so and so format". If we are transparent about the fact 
that what is used in CQL is accepted in sstable_compressor map, it is really a 
no-brainer.

[~mck] what do you think?


was (Author: smiklosovic):
I think being consistent with CQL as well as be prepared for the future so we 
can specify any compressor without changing anything is more important. 

To summarise:
1) consistency with CQL
2) zero learning curve, a user just puts there what he is used to
3) any future in-built compressor supported out of the box so we do not need to 
think about it and we do not need to change other enums, switches etc to 
support that
4) custom compressor supported as well
5) any parameters possible to add
6) uses same code path for getting CompressorParams, no new classes and 
boiler-plate code necessary
7) One way of specifying in-built compressor as well as the custom one, we do 
not need to do any difference between them
8) We can configure every single parameter of a compressor, not only that 
helper creation functions offer us

I really think that all these points in total beats the argument that we need 
to have parameters in "so and so format". If we are transparent about the fact 
that what is used in CQL is accepted in sstable_compressor map, it is really a 
no-brainer.

[~mck] what do you think?

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-05 Thread Claude Warren (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709226#comment-17709226
 ] 

Claude Warren edited comment on CASSANDRA-12937 at 4/6/23 6:53 AM:
---

Looking at CompressionParams there are a number of default configurations e.g. 
snappy(), lz4(), and noCompression() that I thought would be in common use.  
What I wanted to do was to provide an easy way to get to call those methods as 
well as provide the ability to load any compressor via the map.

Also, early on the idea of putting chunk_length_in_kb was rejected with the 
"16KiB" form for the input requested.

If it is agreed to remove the shortcuts and use the simple map form with the 
parameters I'll make those changes.

I did come across a note that says that configuration file and CQL use 
different parameters for compression, thus I onluy implemented 
min_compress_ratio and used it to calculate max_compression_length.

So I got to where the code is by trying to support the defaults in 
CompressonParams and following the min_compress_ratio not 
max_compression_length in the config files.

you can configure the ztsd with 12Kib chunks by setting:
{code:java}
sstable_compressor:
  chunk_length: 12KiB
  type: zstd
{code}


was (Author: claudenw):
Looking at CompressionParams there are a number of default configurations e.g. 
snappy(), lz4(), and noCompression() that I thought would be in common use.  
What I wanted to do was to provide an easy way to get to call those methods as 
well as provide the ability to load any compressor via the map.

Also, early on the idea of putting chunk_length_in_kb was rejected with the 
"16KiB" form for the input requested.

If it is agreed to remove the shortcuts and use the simple map form with the 
parameters I'll make those changes.

I did come across a note that says that configuration file and CQL use 
different parameters for compression, thus I onluy implemented 
min_compress_ratio and used it to calculate max_compression_length.

So I got to where the code is by trying to support the defaults in 
CompressonParams and following the min_compress_ratio not 
max_compression_length in the config files.

 

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-05 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709083#comment-17709083
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/5/23 7:39 PM:
---

I did it here https://github.com/apache/cassandra/pull//files

The configuration is as simple as this:

{code}
sstable_compressor:
  class: "org.apache.cassandra.io.compress.LZ4Compressor"
  chunk_length_in_kb: "16"
  min_compress_ratio: "0"
{code}

Since sstable_compressor is a map, it may contain whatever parameters. The 
creation of compressor / validation is done upon node's startup.

This solution is prepared for whatever compressor, whatever parameters and it 
accepts same parameters as specified in CQL so there is nothing new to learn.




was (Author: smiklosovic):
I did it here https://github.com/apache/cassandra/pull//files

The configuration is as simple as this:

{code}
sstable_compressor:
  class: "org.apache.cassandra.io.compress.LZ4Compressor"
  chunk_length_in_kb: "16"
  min_compress_ratio: "0"
{code}

Since sstable_compressor is a map, it may contain whatever parameters. The 
creation of compressor / validation is done upon node's startup.

This solution is prepared for whatever compressor, whatever parameters and it 
accepts same parameters as specified in CQL so there nothing new to learn.



> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-05 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709025#comment-17709025
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/5/23 5:25 PM:
---

also clean j11 
[https://app.circleci.com/pipelines/github/instaclustr/cassandra/2055/workflows/3ec6e5cf-36bf-45aa-a794-27a88a1ee0de]

[~mck]  would you take a look? on this branch, please? 
[https://github.com/apache/cassandra/pull/]


was (Author: smiklosovic):
also clean j11 
[https://app.circleci.com/pipelines/github/instaclustr/cassandra/2055/workflows/3ec6e5cf-36bf-45aa-a794-27a88a1ee0de]

[~mck]  would you take a look? on this branch, please? 
[https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937]

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-04-04 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708537#comment-17708537
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 4/4/23 6:44 PM:
---

I added fixes here (1), exactly this commit (2).

The code was not compilable, it was failing on ant rat. Also, it seems to me 
that you used Java 9 / 11 as String.isBlank() is not in Java 8 yet so it failed 
to compile it.

There are also various formatting improvements etc.

I am building it as we speak.

(1) https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937
(2) 
https://github.com/instaclustr/cassandra/commit/95422f915fb30c27e4691fbc5711b3361d0331a3

You are welcome to git cherry-pick this commit on top of your branch or we will 
just ship my branch (squashed with you as author).


was (Author: smiklosovic):
I added fixes here (1), exactly this commit (2).

The code was not compilable, it was failing on ant rat. Also, it seems to me 
that you used Java 9 / 11 as String.isBlank() is not in Java 8 yet to it failed 
to compile it.

There are also various formatting improvements etc.

I am building it as we speak.

(1) https://github.com/instaclustr/cassandra/commits/CASSANDRA-12937
(2) 
https://github.com/instaclustr/cassandra/commit/95422f915fb30c27e4691fbc5711b3361d0331a3

You are welcome to git cherry-pick this commit on top of your branch or we will 
just ship my branch (squashed with you as author).

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2023-03-16 Thread Stefan Miklosovic (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701098#comment-17701098
 ] 

Stefan Miklosovic edited comment on CASSANDRA-12937 at 3/16/23 11:26 AM:
-

[~claude] would you mind to rework that PR against current trunk? I am getting 
a lot of conflicts.


was (Author: smiklosovic):
[~claude] would you mind to rework that PR against trunk? I am getting a lot of 
conflicts. This is a new feature and should be delivered in 5.0 first.

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Claude Warren
>Priority: Low
>  Labels: AdventCalendar2021, lhf
> Fix For: 5.x
>
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org