yifan-c commented on a change in pull request #17:
URL: https://github.com/apache/cassandra-diff/pull/17#discussion_r715868939



##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/Differ.java
##########
@@ -225,12 +229,28 @@ public RangeStats diffTable(final DiffContext context,
                                                               mismatchReporter,
                                                               journal,
                                                               
COMPARISON_EXECUTOR);
-
-        final RangeStats tableStats = rangeComparator.compare(sourceKeys, 
targetKeys, partitionTaskProvider);
+        final Predicate<PartitionKey> partitionSamplingFunction = 
shouldIncludePartition(jobId, partitionSamplingProbability);
+        final RangeStats tableStats = rangeComparator.compare(sourceKeys, 
targetKeys, partitionTaskProvider, partitionSamplingFunction);
         logger.debug("Table [{}] stats - ({})", context.table.getTable(), 
tableStats);
         return tableStats;
     }
 
+    // Returns a function which decides if we should include a partition for 
diffing
+    // Uses probability for sampling.
+    @VisibleForTesting
+    static Predicate<PartitionKey> shouldIncludePartition(final UUID jobId, 
final double partitionSamplingProbability) {
+        if (partitionSamplingProbability > 1 || partitionSamplingProbability 
<= 0) {
+            logger.error("Invalid partition sampling property {}, it should be 
between 0 and 1", partitionSamplingProbability);

Review comment:
       ```suggestion
               logger.error("Invalid partition sampling property value: {}. It 
should be between 0 and 1", partitionSamplingProbability);
   ```
   
   Actually, let's create a error message string in advance and use it for 
`logger.error` and the exception.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to