yifan-c commented on code in PR #2824:
URL: https://github.com/apache/cassandra/pull/2824#discussion_r1366116203
##########
src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java:
##########
@@ -36,64 +42,132 @@
* means that rows should be added by increasing md5 of the row key. This is
* rarely possible and SSTableSimpleUnsortedWriter should most of the time be
* prefered.
+ * <p>
+ * Optionally, the writer can be configured with a max SSTable size for
SSTables.
+ * The output will be a series of SSTables that do not exceed a specified size.
+ * By default, all sorted data are written into a single SSTable.
*/
class SSTableSimpleWriter extends AbstractSSTableSimpleWriter
{
+ private final long maxSSTableSize;
+
+ // Used to compute the row serialized size
+ private final SerializationHeader header;
+ private final SerializationHelper helper;
+
protected DecoratedKey currentKey;
protected PartitionUpdate.Builder update;
+ private long currentSize;
private SSTableTxnWriter writer;
- protected SSTableSimpleWriter(File directory, TableMetadataRef metadata,
RegularAndStaticColumns columns)
+ /**
+ * Create a SSTable writer for sorted input data.
+ * When a positive {@param maxSSTableSizeInMiB} is defined, the writer
outputs a sequence of SSTables,
+ * whose sizes do not exceed the specified value.
+ *
+ * @param directory directory to store the sstable files
+ * @param metadata table metadata
+ * @param columns columns to update
+ * @param maxSSTableSizeInMiB defines the max SSTable size if the value is
positive.
+ * Any non-positive value indicates the sstable
size is unlimited.
+ */
+ protected SSTableSimpleWriter(File directory, TableMetadataRef metadata,
RegularAndStaticColumns columns, long maxSSTableSizeInMiB)
{
super(directory, metadata, columns);
+ this.maxSSTableSize = maxSSTableSizeInMiB * 1024L * 1024L;
+ this.header = new SerializationHeader(true, metadata.get(), columns,
EncodingStats.NO_STATS);
+ this.helper = new SerializationHelper(this.header);
}
- private SSTableTxnWriter getOrCreateWriter() throws IOException
- {
- if (writer == null)
- writer = createWriter(null);
-
- return writer;
- }
-
+ @Override
PartitionUpdate.Builder getUpdateFor(DecoratedKey key) throws IOException
{
- assert key != null;
+ Preconditions.checkArgument(key != null, "Partition update cannot have
null key");
Review Comment:
Let's qualify the number of instructions.
Using `assert`
```
org.apache.cassandra.db.partitions.PartitionUpdate$Builder
getUpdateFor(org.apache.cassandra.db.DecoratedKey) throws java.io.IOException;
descriptor:
(Lorg/apache/cassandra/db/DecoratedKey;)Lorg/apache/cassandra/db/partitions/PartitionUpdate$Builder;
flags: (0x0000)
Code:
stack=7, locals=2, args_size=2
0: getstatic #4 // Field $assertionsDisabled:Z
3: ifne 18
6: aload_1
7: ifnonnull 18
10: new #5 // class
java/lang/AssertionError
13: dup
14: invokespecial #6 // Method
java/lang/AssertionError."<init>":()V
17: athrow
18: aload_1
19: aload_0
```
Using nullness check with `Precondition`
```
org.apache.cassandra.db.partitions.PartitionUpdate$Builder
getUpdateFor(org.apache.cassandra.db.DecoratedKey) throws java.io.IOException;
descriptor:
(Lorg/apache/cassandra/db/DecoratedKey;)Lorg/apache/cassandra/db/partitions/PartitionUpdate$Builder;
flags: (0x0000)
Code:
stack=7, locals=2, args_size=2
0: aload_1
1: ifnull 8
4: iconst_1
5: goto 9
8: iconst_0
9: ldc #5 // String Partition update
cannot have null key
11: invokestatic #6 // Method
com/google/common/base/Preconditions.checkArgument:(ZLjava/lang/Object;)V
14: aload_0
```
Assume the input is not null.
In the case of `assert`, when with `-ea`, the method is going to check
whether assertion is disabled and check whether the input is null. There are 4
instructions. When w/o `-ea`, there are 2 instructions to check whether
assertion is disabled.
In the case of `Preconditions`, there are 6 instructions, i.e. `0, 1, 4, 5,
9, 11` compare null, store comparison result, and invoke checkArgument method.
It has 2 or 4 extra instructions comparing with `assert`, depending on whether
it is enabled or not.
The method `getUpdateFor` is called per each row. Adding a row easily runs
thousands of instructions. A couple more instruction is a negligible tradeoff,
but provides a good improvement with the nicer descriptive error message and
guarantees the check is always performed for correctness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]