Re: [PR] CASSANDRA-21078 move training params to CQL [cassandra]

via GitHub Mon, 15 Dec 2025 16:56:03 -0800


yifan-c commented on code in PR #4523:
URL: https://github.com/apache/cassandra/pull/4523#discussion_r2621320070



##########
doc/modules/cassandra/pages/managing/operating/compression.adoc:
##########
@@ -298,12 +298,14 @@ next access.
 
 === Training Configuration
 
-* `compression_dictionary_training_max_dictionary_size` (default: `65536`):
+* `training_max_dictionary_size` (default: `65536`):
 Maximum size of trained dictionaries in bytes. Larger dictionaries can
-capture more patterns but increase memory overhead.
-* `compression_dictionary_training_max_total_sample_size` (default:
+capture more patterns but increase memory overhead. This is a parameter
+of `ZstdDictionaryCompressor` of a table, in `compression` section.
+* `training_max_total_sample_size` (default:
 `10485760`): Maximum total size of sample data to collect for training,
-approximately 10MB.
+approximately 10MB. This is a parameter of `ZstdDictionaryCompressor`
+of a table, in `compression` section.

Review Comment:
   Those are to document the configs in cassandra.yaml. Since they are removed, 
maybe remove from here as well, or clarify that they are referring to the 
parameters in the compression attribute in table schema. 



##########
src/java/org/apache/cassandra/tools/nodetool/CompressionDictionaryCommandGroup.java:
##########
@@ -68,18 +69,31 @@ public static class TrainDictionary extends AbstractCommand
         @Option(names = { "-f", "--force" }, description = "Force the 
dictionary training even if there are not enough samples")
         private boolean force = false;
 
+        @Option(names = {"--max-dict-size"}, description = "Maximum size of a 
trained compression dictionary. " +
+                                                           "Larger 
dictionaries may provide better compression but use more memory. When not set, 
" +
+                                                           "the value from 
compression configuration from CQL for a given table is used.")
+        private String trainingMaxDictionarySize;
+
+        @Option(names = "--max-total-sample-size", description = "Maximum 
total size of sample data to collect for dictionary training. " +
+                                                                 "More sample 
data generally produces better dictionaries but takes longer to train. " +
+                                                                 "The 
recommended sample size is 100x the dictionary size. When not set, " +
+                                                                 "the value 
from compression configuration from CQL for a give table is used.")
+        private String trainingMaxTotalSampleSize;
+

Review Comment:
   Maybe document the default values of each corresponding option. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-21078 move training params to CQL [cassandra]

Reply via email to