blambov commented on code in PR #3091: URL: https://github.com/apache/cassandra/pull/3091#discussion_r1482631381
########## doc/modules/cassandra/partials/table-properties.adoc: ########## @@ -105,6 +105,22 @@ If your application uses batch operations, consider the possibility that decreas The configuration/cass_yaml_file.html#batchlog_replay_throttle[batchlog_replay_throttle] property in the cassandra.yaml file give some control of the batch replay process. The most important factors, however, are the size and scope of the batches you use. +*memtable*:: +Configures the memtable for the table. +Four choices are: Review Comment: The choices here are the ones defined in `cassandra.yaml`, in `memtable.configurations`. With CASSANDRA-18753 (which should be merged soon), `trie` and `skiplist` will be defined in the supplied `cassandra.yaml` and `cassandra-latest.yaml`. ########## doc/modules/cassandra/pages/architecture/storage-engine.adoc: ########## @@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores writes to the memtabl Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable on disk. +=== Trie memtables + +An alternative memtable implementation based on tries, also called prefix trees, is provided alongside the legacy skip list solution. +The implementation is activated using the memtable API (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11] / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]). +Trie memtables improve on the legacy solution in modification and lookup performance, as well as the size of the structure for a given amount of data. + +Trie memtables use a data structure called a trie to organize data. +This structure makes them very efficient at modifying and querying data, as well as more compact in memory. +These features result in higher write throughput, lower latency for accessing recently-written data, while fitting more of it in memory. + +Trie memtables have internal memory management mechanisms, which drastically reduce the amount of work needed for garbage collection, reducing GC-inflicted pauses and higher-percentile latencies. +This improvement is crucial to {cassandra} as it reduces the impact of GC on the system and allows for more predictable performance. + +Trie memtables reduce write amplification, a common problem in database systems, by buffering and organizing writes until they fill up their allocated memory. +By accepting up to 30% more data for the same memory allocation, trie memtables reduce write amplification further. + +In the trie memtable implementation, the concurrent skip-list partitions map is replaced with a sharded single-writer trie. +To maintain partition order, all keys are mapped to their byte-comparable representations. +To minimize the size of the structure, the keys are only stored in the trie paths, and converted back to the standard format on retrieval. + +// In later iterations this will be expanded to include the partition-to-row maps, forming a direct map to rows and doing away with most of the complexity and overhead of maintaining separate partition maps. + +The trie memtable implementation is a pluggable memtable implementation, and can be enabled by setting the `memtable` configuration in `cassandra.yaml` to `trie`. Review Comment: Either by setting a table's `memtable` option to `trie`, or by changing the default implementation in `cassandra.yaml` to inherit `trie` as below. ########## doc/modules/cassandra/pages/reference/cql-commands/create-table-examples.adoc: ########## @@ -68,6 +68,28 @@ After the disk space limit is reached, writes to CDC-enabled tables are rejected See https://docs.datastax.com/en/dse/6.8/dse-admin/datastax_enterprise/config/configCassandra_yaml.html#configCassandra_yaml__cdcSpaceSection[Change-data-capture (CDC) space settings] for information about available CDC settings. ==== +== Create a table with a trie memtable + +To create a table with a trie memtable, a memtable configuration must be enabled in `cassandra.yaml`. See xref:architecture/storage-engine/memtable.adoc[Memtable] for more information. + +To create a table with a trie memtable: + +[source,language-cql] +---- +include::cassandra:example$CQL/cyclist_id-table.cql[tag=trie-memtable] +---- + +== Create a table with a sharded skiplist memtable + +To create a table with a sharded skiplist memtable, a memtable configuration must be enabled in `cassandra.yaml`. See xref:architecture/storage-engine/memtable.adoc[Memtable] for more information. + +To create a table with a trie memtable: Review Comment: Shouldn't this be "sharded skiplist memtable"? ########## doc/modules/cassandra/pages/architecture/storage-engine.adoc: ########## @@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores writes to the memtabl Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable on disk. +=== Trie memtables + +An alternative memtable implementation based on tries, also called prefix trees, is provided alongside the legacy skip list solution. +The implementation is activated using the memtable API (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11] / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]). +Trie memtables improve on the legacy solution in modification and lookup performance, as well as the size of the structure for a given amount of data. + +Trie memtables use a data structure called a trie to organize data. +This structure makes them very efficient at modifying and querying data, as well as more compact in memory. +These features result in higher write throughput, lower latency for accessing recently-written data, while fitting more of it in memory. + +Trie memtables have internal memory management mechanisms, which drastically reduce the amount of work needed for garbage collection, reducing GC-inflicted pauses and higher-percentile latencies. +This improvement is crucial to {cassandra} as it reduces the impact of GC on the system and allows for more predictable performance. + +Trie memtables reduce write amplification, a common problem in database systems, by buffering and organizing writes until they fill up their allocated memory. Review Comment: The first sentence applies to memtables generally; trie ones reduce it further by fitting more data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

