blambov commented on code in PR #2879:
URL: https://github.com/apache/cassandra/pull/2879#discussion_r1386607843
##########
conf/cassandra.yaml:
##########
@@ -1038,11 +1045,11 @@ snapshot_links_per_second: 0
# large rows, or a very large number of rows per partition are
# present, it is recommended to increase the index granularity
# or switch to the BTI SSTable format.
-#
+# The original name of this configuration is colum_index_size.
# Leave undefined to use a default suitable for the SSTable format
# in use (64 KiB for BIG, 16KiB for BTI).
# Min unit: KiB
-# column_index_size: 4KiB
+# row_index_granularity: 4KiB
Review Comment:
The legacy option's name cannot change, but we should deprecate it (as well
as any other options that are being moved by this ticket).
Note that we also need to move the description of the option.
##########
src/java/org/apache/cassandra/io/sstable/format/AbstractSSTableFormat.java:
##########
@@ -32,6 +32,33 @@ protected AbstractSSTableFormat(String name, Map<String,
String> options)
this.options = options;
}
+ public enum Option
Review Comment:
These should be defined by the individual format, because we don't know what
options a format may require.
BTI, for example, only supports two of these.
##########
src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java:
##########
@@ -45,10 +47,13 @@
public interface SSTableFormat<R extends SSTableReader, W extends
SSTableWriter>
{
String name();
+ Map<String, String> options();
Version getLatestVersion();
Version getVersion(String version);
+ Object getSSTableFormatValue(Option option);
Review Comment:
This is overly restrictive. A new format (plugged via an external jar, for
example) will use options which are not in the enumeration.
The general format machinery should deal only with strings. It is up only to
the individual format to interpret and validate them. See e.g. how compaction
and memtables are configured.
##########
src/java/org/apache/cassandra/io/sstable/format/big/BigFormatPartitionWriter.java:
##########
@@ -70,7 +70,12 @@ public class BigFormatPartitionWriter extends
SortedTablePartitionWriter
Version version,
ISerializer<IndexInfo> indexInfoSerializer)
{
- this(header, writer, version, indexInfoSerializer,
DatabaseDescriptor.getColumnIndexCacheSize(),
DatabaseDescriptor.getColumnIndexSize(DEFAULT_GRANULARITY));
+ this(header,
+ writer,
+ version,
+ indexInfoSerializer,
+
(int)version.getSSTableFormatValue(Option.COLUMN_INDEX_CACHE_SIZE),
Review Comment:
The generic `Version` or `SSTableFormat` interfaces should not provide
access to any options.
Instead, `BigFormat` should provide methods to access individual settings
that the format needs, so that, for example, this code becomes
```
this(header,
writer,
version,
indexInfoSerializer,
((BigFormat) version.format).getRowIndexCacheSize(),
((BigFormat) version.format).getRowIndexGranularity());
```
where the format must parse and store the converted value (after any lookup
to `DatabaseDescriptor`). This will likely require changes to
`DatabaseDescriptor.setXXX` methods to make sure they update the format
instances rather than the database descriptor ones.
##########
src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.java:
##########
@@ -26,6 +26,9 @@
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Lists;
import com.google.common.collect.Sets;
+import org.apache.cassandra.config.DataStorageSpec;
Review Comment:
Nit: these imports are in the wrong place.
##########
src/java/org/apache/cassandra/db/compaction/CompactionManager.java:
##########
@@ -1778,10 +1778,11 @@ public void close() {}
}
CompactionStrategyManager strategy =
cfs.getCompactionStrategyManager();
+ SSTableFormat<?, ?> format =
DatabaseDescriptor.getSelectedSSTableFormat();
Review Comment:
Why not `cfs.getSSTableFormat()` to make it easier to switch to per-table
ones?
##########
src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.java:
##########
@@ -138,6 +144,49 @@ public Version getVersion(String version)
return new BtiVersion(this, version);
}
+ @Override
+ public Object getSSTableFormatValue(Option option)
+ {
+ String value = options.get(option.getName());
+ switch (option)
+ {
+ case ROW_INDEX_GRANULARITY:
+ if (value == null)
+ {
+ return
DatabaseDescriptor.getRowIndexGranularity(DEFAULT_GRANULARITY);
+ }
+ DataStorageSpec.IntKibibytesBound resultKb = new
DataStorageSpec.IntKibibytesBound(value);
+ checkValidForByteConversion(resultKb, "row_index_granularity");
+ return resultKb.toKibibytes();
+ case INDEX_SUMMARY_CAPACITY:
Review Comment:
This is not something the BTI format should or can return; it does not have
such an option.
##########
src/java/org/apache/cassandra/index/sai/plan/Expression.java:
##########
@@ -168,7 +168,7 @@ public Expression add(Operator op, ByteBuffer value)
break;
}
- assert operator != null;
+ assert operator != null || context.isNotIndexed() : "Cannot use '" +
op + "' operator with indexed " + context.getColumnName() + " column";;
Review Comment:
Is this change a part of the patch?
##########
src/java/org/apache/cassandra/db/compaction/CompactionManager.java:
##########
@@ -1438,7 +1438,7 @@ private void doCleanupOne(final ColumnFamilyStore cfs,
List<SSTableReader> finished;
long nowInSec = FBUtilities.nowInSeconds();
- try (SSTableRewriter writer = SSTableRewriter.construct(cfs, txn,
false, sstable.maxDataAge);
+ try (SSTableRewriter writer = SSTableRewriter.construct(cfs,
sstable.descriptor.getFormat(), txn, false, sstable.maxDataAge);
Review Comment:
This should use the currently selected format (we upgrade version/format on
all compactions unless we are specifically doing a downgrade operation).
##########
src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java:
##########
@@ -87,24 +88,24 @@ public SSTableRewriter(ILifecycleTransaction transaction,
long maxAge, long pree
this.eagerWriterMetaRelease = eagerWriterMetaRelease;
}
- public static SSTableRewriter
constructKeepingOriginals(ILifecycleTransaction transaction, boolean
keepOriginals, long maxAge)
+ public static SSTableRewriter
constructKeepingOriginals(ILifecycleTransaction transaction, SSTableFormat<?,
?> format, boolean keepOriginals, long maxAge)
{
- return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(true), keepOriginals, true);
+ return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(format, true), keepOriginals, true);
}
- public static SSTableRewriter
constructWithoutEarlyOpening(ILifecycleTransaction transaction, boolean
keepOriginals, long maxAge)
+ public static SSTableRewriter
constructWithoutEarlyOpening(ILifecycleTransaction transaction,
SSTableFormat<?, ?> format, boolean keepOriginals, long maxAge)
{
- return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(false), keepOriginals, true);
+ return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(format, false), keepOriginals, true);
}
- public static SSTableRewriter construct(ColumnFamilyStore cfs,
ILifecycleTransaction transaction, boolean keepOriginals, long maxAge)
+ public static SSTableRewriter construct(ColumnFamilyStore cfs,
SSTableFormat<?, ?> format, ILifecycleTransaction transaction, boolean
keepOriginals, long maxAge)
{
- return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(cfs.supportsEarlyOpen()), keepOriginals, true);
+ return new SSTableRewriter(transaction, maxAge,
calculateOpenInterval(format, cfs.supportsEarlyOpen()), keepOriginals, true);
}
- private static long calculateOpenInterval(boolean shouldOpenEarly)
+ private static long calculateOpenInterval(SSTableFormat<?, ?> format,
boolean shouldOpenEarly)
{
- long interval =
DatabaseDescriptor.getSSTablePreemptiveOpenIntervalInMiB() * (1L << 20);
+ long interval = (long)
format.getSSTableFormatValue(Option.SSTABLE_PREEMPTIVE_OPEN_INTERVAL) * (1L <<
20);
Review Comment:
The format will not necessarily support this. To make it accessibly in a
general way, define e.g. a `PreemptiveOpenConfiguration` interface that
provides this value and make the two format implement it. If the given format
implements that interface, get the value from it. If not, disable early open
(i.e. use -1).
##########
src/java/org/apache/cassandra/io/sstable/indexsummary/IndexSummaryManager.java:
##########
@@ -99,24 +106,26 @@ private IndexSummaryManager(Supplier<List<T>>
indexSummariesProvider)
this.indexSummariesProvider = indexSummariesProvider;
executor = executorFactory().scheduled(false, "IndexSummaryManager",
Thread.MIN_PRIORITY);
-
- long indexSummarySizeInMB =
DatabaseDescriptor.getIndexSummaryCapacityInMiB();
- int interval =
DatabaseDescriptor.getIndexSummaryResizeIntervalInMinutes();
+ // TODO make format get from default format
+ format = DatabaseDescriptor.getSelectedSSTableFormat();
Review Comment:
Because the format shouldn't always offer these options, we need to offer
some way of recognizing an exposing them somewhat generally.
One way is to check here if the format implements an interface e.g. called
`IndexSummaryConfiguration` which exposes these. BIG would implement it, and
BTI would not. A further complication is that for this initialization we want
to find the format that does use an index summary and get the options from
there. If there's no such format, the manager should not be initialized at all
(which is a good way of turning this and keycache off if not needed); if there
are multiple, we should probably check that the options they provide match.
##########
src/java/org/apache/cassandra/db/compaction/Upgrader.java:
##########
@@ -92,7 +92,7 @@ public void upgrade(boolean keepOriginals)
{
outputHandler.output("Upgrading " + sstable);
long nowInSec = FBUtilities.nowInSeconds();
- try (SSTableRewriter writer = SSTableRewriter.construct(cfs,
transaction, keepOriginals,
CompactionTask.getMaxDataAge(transaction.originals()));
+ try (SSTableRewriter writer = SSTableRewriter.construct(cfs,
sstable.descriptor.getFormat(), transaction, keepOriginals,
CompactionTask.getMaxDataAge(transaction.originals()));
Review Comment:
An upgrade _must_ use the currently selected format; otherwise it is not
doing what is expected of it (one of the main use cases for upgrade will be to
switch to BTI from BIG).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]