Re: [PR] CASSANDRA-18945: Unified Compaction Strategy is creating too many sstables [cassandra]

via GitHub Thu, 26 Oct 2023 02:15:20 -0700


blambov commented on code in PR #2836:
URL: https://github.com/apache/cassandra/pull/2836#discussion_r1372785642



##########
src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md:
##########
@@ -214,6 +214,66 @@ deletions, the resulting sstables will be of size 75 MiB, 
token share 1/16 and d
 
 This sharding mechanism is independent of the compaction specification.
 
+## Full sharding scheme
+
+This sharding scheme easily admits extensions. In particular, when the size of 
the data set is expected to grow very
+large, to avoid having to pre-specify a high enough target size to avoid 
problems with per-sstable overhead, we can
+apply an "sstable growth" parameter, which determines what part of the density 
growth should be assigned to increased
+SSTable size, reducing the growth of the number of shards (and hence 
non-overlapping sstables).
+
+Additionally, to allow for a mode of operation with a fixed number of shards, 
and splitting conditional on reaching
+a minimum size, we provide for a "minimum sstable size" that reduces the base 
shard count whenever that would result
+in sstables smaller than the provided minimum.
+
+Generally, the user can specify four sharding parameters:
+
+- base shard count $b$
+- target sstable size $t$
+- minimum sstable size $m$
+- sstable growth component $\lambda$
+
+The number of shards $S$ for a given density $d$ is then calculated as
+
+$$
+S =
+\begin{cases}
+1
+    & \text{if } d < m \\
+2^{\left\lfloor \log_2 \frac d m \right\rfloor}

Review Comment:
   This is actually the minimum of this number and the highest power-of-2 
divisor of `b`.



##########
src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md:
##########
@@ -214,6 +214,66 @@ deletions, the resulting sstables will be of size 75 MiB, 
token share 1/16 and d
 
 This sharding mechanism is independent of the compaction specification.
 
+## Full sharding scheme
+
+This sharding scheme easily admits extensions. In particular, when the size of 
the data set is expected to grow very
+large, to avoid having to pre-specify a high enough target size to avoid 
problems with per-sstable overhead, we can
+apply an "sstable growth" parameter, which determines what part of the density 
growth should be assigned to increased
+SSTable size, reducing the growth of the number of shards (and hence 
non-overlapping sstables).
+
+Additionally, to allow for a mode of operation with a fixed number of shards, 
and splitting conditional on reaching
+a minimum size, we provide for a "minimum sstable size" that reduces the base 
shard count whenever that would result
+in sstables smaller than the provided minimum.
+
+Generally, the user can specify four sharding parameters:
+
+- base shard count $b$
+- target sstable size $t$
+- minimum sstable size $m$
+- sstable growth component $\lambda$
+
+The number of shards $S$ for a given density $d$ is then calculated as
+
+$$
+S =
+\begin{cases}
+1
+    & \text{if } d < m \\
+2^{\left\lfloor \log_2 \frac d m \right\rfloor}
+    & \text{if } d < mb \\
+b
+    & \text{if } d < tb \\
+2^{\left\lfloor (1-\lambda) \cdot \log_2 \left( {\frac d t \cdot \frac 1 
b}\right)\right\rceil} \cdot b
+    & \text{otherwise}
+\end{cases}
+$$
+
+Some useful combinations of these parameters:
+
+- The basic scheme above uses a sstable growth $\lambda=0$, and a minimum 
sstable size $m=0$. The graph below
+  illustrates it for base shard count $b=4$ and target sstable size 
$t=1\mathrm{GB}$:
+
+![Graph with lambda 0](unified/shards_graph_lambda_0.svg)

Review Comment:
   The graphs appear to be missing.



##########
src/java/org/apache/cassandra/db/compaction/unified/Controller.java:
##########
@@ -93,10 +92,12 @@ public class Controller
      * applied as an exponent in the number of split points. In other words, 
the given value applies as a negative
      * exponent in the calculation of the number of split points.
      * <p>
-     * Using 0 (the default) applies no correction to the number of split 
points, resulting in SSTables close to the
+     * Using 0 applies no correction to the number of split points, resulting 
in SSTables close to the
      * target size. Setting this number to 1 will make UCS never split beyong 
the base shard count. Using 0.5 will
      * make the number of split points a square root of the required number 
for the target SSTable size, making
-     * the number of split points and the size of SSTables grow in lockstep as 
the density grows.
+     * the number of split points and the size of SSTables grow in lockstep as 
the density grows. Using
+     * 0.333 (the default) makes the sstable growth the cubic root of the 
density growth, i.e. the sstable size
+     * grows with the square root of the growth of the shard count.
      * <p>
      * For example, given a data size of 1TiB on the top density level and 
1GiB target size with base shard count of 1,
      * growth 0 would result in 1024 SSTables of ~1GiB each, 0.5 would yield 
32 SSTables of ~32GiB each, and 1 would

Review Comment:
   Let's also add 1/3 to these examples: 128 SSTables of ~8 GiB each, which 
happens to be the case both here (for b=1) and in the next paragraph (for b = 
4).



##########
src/java/org/apache/cassandra/db/compaction/unified/Controller.java:
##########
@@ -424,8 +425,7 @@ public static Controller fromOptions(ColumnFamilyStore cfs, 
Map<String, String>
         }
         else
         {
-            if (SchemaConstants.isSystemKeyspace(cfs.getKeyspaceName())
-                || (cfs.getDiskBoundaries().positions != null && 
cfs.getDiskBoundaries().positions.size() > 1))
+            if (cfs.getDiskBoundaries().positions != null && 
cfs.getDiskBoundaries().positions.size() > 1)

Review Comment:
   I would remove this special case too, the minimum size should handle it.



##########
src/java/org/apache/cassandra/db/compaction/unified/Controller.java:
##########
@@ -202,36 +247,110 @@ public int getThreshold(int index) {
     /**
      * Calculate the number of shards to split the local token space in for 
the given sstable density.
      * This is calculated as a power-of-two multiple of baseShardCount, so 
that the expected size of resulting sstables
-     * is between sqrt(0.5) * targetSSTableSize and sqrt(2) * 
targetSSTableSize, with a minimum of baseShardCount shards
-     * for smaller sstables.
-     *
+     * is between sqrt(0.5) and sqrt(2) times the target size, which is 
calculated from targetSSTableSize to grow
+     * at the given sstableGrowthModifier of the exponential growth of the 
density.
+     * <p>
+     * Additionally, if a minimum sstable size is set, we can go below the 
baseShardCount when that would result in
+     * sstables smaller than that minimum. Note that in the case of a 
non-power-of-two base count, we will only
+     * split to divisors of baseShardCount.
+     * <p>
      * Note that to get the sstables resulting from this splitting within the 
bounds, the density argument must be
      * normalized to the span that is being split. In other words, if no disks 
are defined, the density should be
      * scaled by the token coverage of the locally-owned ranges. If multiple 
data directories are defined, the density
-     * should be scaled by the token coverage of the respective data 
directory. That is localDensity = size / span,
+     * should be scaled by the token coverage of the respective data 
directory. That is, localDensity = size / span,
      * where the span is normalized so that span = 1 when the data covers the 
range that is being split.
      */
     public int getNumShards(double localDensity)
     {
-        // How many we would have to aim for the target size. Divided by the 
base shard count, so that we can ensure
-        // the result is a multiple of it by multiplying back below.
-        double count = localDensity / (targetSSTableSize * INVERSE_SQRT_2 * 
baseShardCount);
-        if (count > MAX_SHARD_SPLIT)
-            count = MAX_SHARD_SPLIT;
-        assert !(count < 0);    // Must be positive, 0 or NaN, which should 
translate to baseShardCount
-
-        // Make it a power of two multiple of the base count so that split 
points for lower levels remain split points
-        // for higher.
-        // The conversion to int and highestOneBit round down, for which we 
compensate by using the sqrt(0.5) multiplier
-        // applied above.
-        // Setting the bottom bit to 1 ensures the result is at least 
baseShardCount.
-        int shards = baseShardCount * Integer.highestOneBit((int) count | 1);
-        logger.debug("Shard count {} for density {}, {} times target {}",
-                     shards,
-                     FBUtilities.prettyPrintBinary(localDensity, "B", " "),
-                     localDensity / targetSSTableSize,
-                     FBUtilities.prettyPrintBinary(targetSSTableSize, "B", " 
"));
-        return shards;
+        int shards;
+        // Check the minimum size first.
+        if (minSSTableSize > 0)
+        {
+            double count = localDensity / minSSTableSize;
+            // Minimum size only applies if it is smaller than the base count.
+            // Note: the minimum size cannot be larger than the target size's 
minimum.
+            if (count < baseShardCount)
+            {
+                // Make it a power of two, rounding down so that sstables are 
greater in size than the min.
+                // Setting the bottom bit to 1 ensures the result is at least 
1.
+                // If baseShardCount is not a power of 2, split only to powers 
of two that are divisors of baseShardCount so boundaries match higher levels
+                shards = Math.min(Integer.highestOneBit((int) count | 1), 1 << 
Integer.numberOfTrailingZeros(baseShardCount));

Review Comment:
   Nit: I found an even easier way to find the highest power-of-2 divisor / 
lowest one bit of a number: `n & -n`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-18945: Unified Compaction Strategy is creating too many sstables [cassandra]

Reply via email to