yifan-c commented on code in PR #4523:
URL: https://github.com/apache/cassandra/pull/4523#discussion_r2638816697
##########
src/java/org/apache/cassandra/tools/nodetool/CompressionDictionaryCommandGroup.java:
##########
@@ -68,18 +75,40 @@ public static class TrainDictionary extends AbstractCommand
@Option(names = { "-f", "--force" }, description = "Force the
dictionary training even if there are not enough samples")
private boolean force = false;
+ @Option(names = {"--max-dict-size"}, description = "Maximum size of a
trained compression dictionary. " +
+ "Larger
dictionaries may provide better compression but use more memory. When not set,
" +
+ "the value from
compression configuration from CQL for a given table is used. " +
+ "The default value
is " + DEFAULT_TRAINING_MAX_DICTIONARY_SIZE_PARAMETER_VALUE + '.')
+ private String trainingMaxDictionarySize;
+
+ @Option(names = "--max-total-sample-size", description = "Maximum
total size of sample data to collect for dictionary training. " +
+ "More sample
data generally produces better dictionaries but takes longer to train. " +
+ "The
recommended sample size is 100x the dictionary size. When not set, " +
+ "the value
from compression configuration from CQL for a give table is used. " +
+ "The default
value is " + DEFAULT_TRAINING_MAX_TOTAL_SAMPLE_SIZE_PARAMETER_VALUE + '.')
+ private String trainingMaxTotalSampleSize;
+
@Override
public void execute(NodeProbe probe)
{
PrintStream out = probe.output().out;
PrintStream err = probe.output().err;
+ validateParameters(err, trainingMaxDictionarySize,
trainingMaxTotalSampleSize);
+
try
{
out.printf("Starting compression dictionary training for
%s.%s...%n", keyspace, table);
out.printf("Training from existing SSTables (flushing first if
needed)%n");
- probe.trainCompressionDictionary(keyspace, table, force);
+ Map<String, String> parameters = new HashMap<>();
+ if (trainingMaxTotalSampleSize != null)
+
parameters.put(TRAINING_MAX_DICTIONARY_SIZE_PARAMETER_NAME,
trainingMaxDictionarySize);
Review Comment:
The condition is seemingly wrong. It should be
```
if (trainingMaxDictionarySize != null)
```
##########
doc/modules/cassandra/pages/managing/operating/compression.adoc:
##########
@@ -323,11 +317,34 @@ compression_dictionary_cache_expire: 3600
# Automatic training
compression_dictionary_training_auto_train_enabled: false
-compression_dictionary_training_sampling_rate: 100
-compression_dictionary_training_max_dictionary_size: 65536
-compression_dictionary_training_max_total_sample_size: 10485760
+compression_dictionary_training_sampling_rate: 0.01
----
+=== CQL training parameters:
+
+These parameters are meant to be configured via CQL for each respective table
if defaults are not appropriate.
+
+* `training_max_total_sample_size` (default: `10MiB`): Maximum total size of
sample data to collect for training, approximately 10MB. This is a parameter of
`ZstdDictionaryCompressor`
+of a table, in `compression` section.
Review Comment:
nit:
```suggestion
* `training_max_total_sample_size` (default: `10MiB`): Maximum total size of
sample data to collect for training, approximately 10MB. This parameter is
configured in the
table's compression options for `ZstdDictionaryCompressor`.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]