jyothsnakonisa commented on code in PR #4523:
URL: https://github.com/apache/cassandra/pull/4523#discussion_r2636117041


##########
doc/modules/cassandra/pages/managing/operating/compression.adoc:
##########
@@ -298,12 +298,6 @@ next access.
 
 === Training Configuration
 
-* `compression_dictionary_training_max_dictionary_size` (default: `65536`):
-Maximum size of trained dictionaries in bytes. Larger dictionaries can
-capture more patterns but increase memory overhead.
-* `compression_dictionary_training_max_total_sample_size` (default:
-`10485760`): Maximum total size of sample data to collect for training,
-approximately 10MB.
 * `compression_dictionary_training_auto_train_enabled` (default: `false`):
 Enable automatic background dictionary training. When enabled, Cassandra
 samples writes and trains dictionaries automatically.

Review Comment:
   isn't sampling rate supposed to be a decimal, Please change the 
documentation accordingly everywhere



##########
.circleci/config.yml:
##########
@@ -24,7 +24,7 @@ jobs:
     resource_class: medium
     working_directory: ~/
     shell: /bin/bash -eo pipefail -l
-    parallelism: 4
+    parallelism: 25

Review Comment:
   Did you checkin these changes in the file by accident?



##########
src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java:
##########
@@ -290,15 +297,16 @@ public TrainingState getTrainingState()
     }
 
     @Override
-    public boolean start(boolean manualTraining)
+    public boolean start(boolean manualTraining, 
CompressionDictionaryTrainingConfig trainingConfig)
     {
         if (closed || !(manualTraining || shouldAutoStartTraining()))
             return false;
 
         try
         {
             // reset on starting; a new zstdTrainer instance is created during 
reset
-            reset();
+            reset(trainingConfig);
+            config = trainingConfig;

Review Comment:
   Should config be volatile? reset(...) is setting the `config` to be null. 
There could be other thread calling  `isReady()` method which checks config to 
be not null and there could be a race condition.



##########
src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java:
##########
@@ -68,17 +68,23 @@ public class ZstdDictionaryTrainer implements 
ICompressionDictionaryTrainer
     private volatile TrainingStatus currentTrainingStatus;
     private volatile String failureMessage;
 
-    public ZstdDictionaryTrainer(String keyspaceName, String tableName,
-                                 CompressionDictionaryTrainingConfig config,
-                                 int compressionLevel)
+    public ZstdDictionaryTrainer(String keyspaceName, String tableName, int 
compressionLevel)
+    {
+        this(keyspaceName,
+             tableName,
+             compressionLevel,
+             Math.round(1 / 
DatabaseDescriptor.getCompressionDictionaryTrainingSamplingRate()));
+    }
+
+    @VisibleForTesting
+    public ZstdDictionaryTrainer(String keyspaceName, String tableName, int 
compressionLevel, int samplingRate)

Review Comment:
   You are passing `1/samplingRate` here, should the name of the variable be 
percentage? or any other appropriate name
   ```suggestion
       public ZstdDictionaryTrainer(String keyspaceName, String tableName, int 
compressionLevel, int samplingPercentage)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to