yifan-c commented on code in PR #4523:
URL: https://github.com/apache/cassandra/pull/4523#discussion_r2621320070
##########
doc/modules/cassandra/pages/managing/operating/compression.adoc:
##########
@@ -298,12 +298,14 @@ next access.
=== Training Configuration
-* `compression_dictionary_training_max_dictionary_size` (default: `65536`):
+* `training_max_dictionary_size` (default: `65536`):
Maximum size of trained dictionaries in bytes. Larger dictionaries can
-capture more patterns but increase memory overhead.
-* `compression_dictionary_training_max_total_sample_size` (default:
+capture more patterns but increase memory overhead. This is a parameter
+of `ZstdDictionaryCompressor` of a table, in `compression` section.
+* `training_max_total_sample_size` (default:
`10485760`): Maximum total size of sample data to collect for training,
-approximately 10MB.
+approximately 10MB. This is a parameter of `ZstdDictionaryCompressor`
+of a table, in `compression` section.
Review Comment:
Those are to document the configs in cassandra.yaml. Since they are removed,
maybe remove from here as well, or clarify that they are referring to the
parameters in the compression attribute in table schema.
##########
src/java/org/apache/cassandra/tools/nodetool/CompressionDictionaryCommandGroup.java:
##########
@@ -68,18 +69,31 @@ public static class TrainDictionary extends AbstractCommand
@Option(names = { "-f", "--force" }, description = "Force the
dictionary training even if there are not enough samples")
private boolean force = false;
+ @Option(names = {"--max-dict-size"}, description = "Maximum size of a
trained compression dictionary. " +
+ "Larger
dictionaries may provide better compression but use more memory. When not set,
" +
+ "the value from
compression configuration from CQL for a given table is used.")
+ private String trainingMaxDictionarySize;
+
+ @Option(names = "--max-total-sample-size", description = "Maximum
total size of sample data to collect for dictionary training. " +
+ "More sample
data generally produces better dictionaries but takes longer to train. " +
+ "The
recommended sample size is 100x the dictionary size. When not set, " +
+ "the value
from compression configuration from CQL for a give table is used.")
+ private String trainingMaxTotalSampleSize;
+
Review Comment:
Maybe document the default values of each corresponding option.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]