Re: [PR] CEP-19: Add trie memtable docs-storage engine, create and alter table [cassandra]

via GitHub Thu, 08 Feb 2024 01:12:42 -0800


blambov commented on code in PR #3091:
URL: https://github.com/apache/cassandra/pull/3091#discussion_r1482631381



##########
doc/modules/cassandra/partials/table-properties.adoc:
##########
@@ -105,6 +105,22 @@ If your application uses batch operations, consider the 
possibility that decreas
 The 
configuration/cass_yaml_file.html#batchlog_replay_throttle[batchlog_replay_throttle]
 property in the cassandra.yaml file give some control of the batch replay 
process.
 The most important factors, however, are the size and scope of the batches you 
use.
 
+*memtable*::
+Configures the memtable for the table.
+Four choices are:

Review Comment:
   The choices here are the ones defined in `cassandra.yaml`, in 
`memtable.configurations`. With CASSANDRA-18753 (which should be merged soon), 
`trie` and `skiplist` will be defined in the supplied `cassandra.yaml` and 
`cassandra-latest.yaml`.



##########
doc/modules/cassandra/pages/architecture/storage-engine.adoc:
##########
@@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores 
writes to the memtabl
 
 Data in the commit log is purged after its corresponding data in the memtable 
is flushed to an SSTable on disk.
 
+=== Trie memtables
+
+An alternative memtable implementation based on tries, also called prefix 
trees, is provided alongside the legacy skip list solution. 
+The implementation is activated using the memtable API 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11]
 / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]).
+Trie memtables improve on the legacy solution in modification and lookup 
performance, as well as the size of the structure for a given amount of data.
+
+Trie memtables use a data structure called a trie to organize data. 
+This structure makes them very efficient at modifying and querying data, as 
well as more compact in memory. 
+These features result in higher write throughput, lower latency for accessing 
recently-written data, while fitting more of it in memory.
+
+Trie memtables have internal memory management mechanisms, which drastically 
reduce the amount of work needed for garbage collection, reducing GC-inflicted 
pauses and higher-percentile latencies.
+This improvement is crucial to {cassandra} as it reduces the impact of GC on 
the system and allows for more predictable performance.
+
+Trie memtables reduce write amplification, a common problem in database 
systems, by buffering and organizing writes until they fill up their allocated 
memory. 
+By accepting up to 30% more data for the same memory allocation, trie 
memtables reduce write amplification further.
+
+In the trie memtable implementation, the concurrent skip-list partitions map 
is replaced with a sharded single-writer trie. 
+To maintain partition order, all keys are mapped to their byte-comparable 
representations. 
+To minimize the size of the structure, the keys are only stored in the trie 
paths, and converted back to the standard format on retrieval.
+
+// In later iterations this will be expanded to include the partition-to-row 
maps, forming a direct map to rows and doing away with most of the complexity 
and overhead of maintaining separate partition maps.
+
+The trie memtable implementation is a pluggable memtable implementation, and 
can be enabled by setting the `memtable` configuration in `cassandra.yaml` to 
`trie`.

Review Comment:
   Either by setting a table's `memtable` option to `trie`, or by changing the 
default implementation in `cassandra.yaml` to inherit `trie` as below.



##########
doc/modules/cassandra/pages/reference/cql-commands/create-table-examples.adoc:
##########
@@ -68,6 +68,28 @@ After the disk space limit is reached, writes to CDC-enabled 
tables are rejected
 See 
https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configCassandra_yaml.html#configCassandra_yaml__cdcSpaceSection[Change-data-capture
 (CDC) space settings] for information about available CDC settings.
 ====
 
+== Create a table with a trie memtable
+
+To create a table with a trie memtable, a memtable configuration must be 
enabled in `cassandra.yaml`. See 
xref:architecture/storage-engine/memtable.adoc[Memtable] for more information.
+
+To create a table with a trie memtable:
+
+[source,language-cql]
+----
+include::cassandra:example$CQL/cyclist_id-table.cql[tag=trie-memtable]
+----
+
+== Create a table with a sharded skiplist memtable
+
+To create a table with a sharded skiplist memtable, a memtable configuration 
must be enabled in `cassandra.yaml`. See 
xref:architecture/storage-engine/memtable.adoc[Memtable] for more information.
+
+To create a table with a trie memtable:

Review Comment:
   Shouldn't this be "sharded skiplist memtable"?



##########
doc/modules/cassandra/pages/architecture/storage-engine.adoc:
##########
@@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores 
writes to the memtabl
 
 Data in the commit log is purged after its corresponding data in the memtable 
is flushed to an SSTable on disk.
 
+=== Trie memtables
+
+An alternative memtable implementation based on tries, also called prefix 
trees, is provided alongside the legacy skip list solution. 
+The implementation is activated using the memtable API 
(https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11]
 / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]).
+Trie memtables improve on the legacy solution in modification and lookup 
performance, as well as the size of the structure for a given amount of data.
+
+Trie memtables use a data structure called a trie to organize data. 
+This structure makes them very efficient at modifying and querying data, as 
well as more compact in memory. 
+These features result in higher write throughput, lower latency for accessing 
recently-written data, while fitting more of it in memory.
+
+Trie memtables have internal memory management mechanisms, which drastically 
reduce the amount of work needed for garbage collection, reducing GC-inflicted 
pauses and higher-percentile latencies.
+This improvement is crucial to {cassandra} as it reduces the impact of GC on 
the system and allows for more predictable performance.
+
+Trie memtables reduce write amplification, a common problem in database 
systems, by buffering and organizing writes until they fill up their allocated 
memory. 

Review Comment:
   The first sentence applies to memtables generally; trie ones reduce it 
further by fitting more data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CEP-19: Add trie memtable docs-storage engine, create and alter table [cassandra]

Reply via email to