blambov commented on code in PR #3091: URL: https://github.com/apache/cassandra/pull/3091#discussion_r1492158922
########## doc/modules/cassandra/pages/architecture/storage-engine.adoc: ########## @@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores writes to the memtabl Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable on disk. +=== Trie memtables + +An alternative memtable implementation based on tries, also called prefix trees, is provided alongside the legacy skip list solution. +The implementation is activated using the memtable API (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11] / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]). +Trie memtables improve on the legacy solution in modification and lookup performance, as well as the size of the structure for a given amount of data. + +Trie memtables use a data structure called a trie to organize data. +This structure makes them very efficient at modifying and querying data, as well as more compact in memory. +These features result in higher write throughput, lower latency for accessing recently-written data, while fitting more of it in memory. + +Trie memtables have internal memory management mechanisms, which drastically reduce the amount of work needed for garbage collection, reducing GC-inflicted pauses and higher-percentile latencies. +This improvement is crucial to {cassandra} as it reduces the impact of GC on the system and allows for more predictable performance. + +Trie memtables reduce write amplification, a common problem in database systems, by buffering and organizing writes until they fill up their allocated memory. Review Comment: You can simply drop "Trie" at the beginning of the sentence, or replace it with "Generally, ". ########## doc/modules/cassandra/pages/architecture/storage-engine.adoc: ########## @@ -102,6 +102,61 @@ If a node stops working, replaying the commit log restores writes to the memtabl Data in the commit log is purged after its corresponding data in the memtable is flushed to an SSTable on disk. +=== Trie memtables + +An alternative memtable implementation based on tries, also called prefix trees, is provided alongside the legacy skip list solution. +The implementation is activated using the memtable API (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations[CEP-11] / https://issues.apache.org/jira/browse/CASSANDRA-17034[CASSANDRA-17034]). +Trie memtables improve on the legacy solution in modification and lookup performance, as well as the size of the structure for a given amount of data. + +Trie memtables use a data structure called a trie to organize data. +This structure makes them very efficient at modifying and querying data, as well as more compact in memory. +These features result in higher write throughput, lower latency for accessing recently-written data, while fitting more of it in memory. + +Trie memtables have internal memory management mechanisms, which drastically reduce the amount of work needed for garbage collection, reducing GC-inflicted pauses and higher-percentile latencies. +This improvement is crucial to {cassandra} as it reduces the impact of GC on the system and allows for more predictable performance. + +Trie memtables reduce write amplification, a common problem in database systems, by buffering and organizing writes until they fill up their allocated memory. +By accepting up to 30% more data for the same memory allocation, trie memtables reduce write amplification further. + +In the trie memtable implementation, the concurrent skip-list partitions map is replaced with a sharded single-writer trie. +To maintain partition order, all keys are mapped to their byte-comparable representations. +To minimize the size of the structure, the keys are only stored in the trie paths, and converted back to the standard format on retrieval. + +// In later iterations this will be expanded to include the partition-to-row maps, forming a direct map to rows and doing away with most of the complexity and overhead of maintaining separate partition maps. + +The trie memtable implementation is a pluggable memtable implementation, and can be enabled by setting the `memtable` configuration in `cassandra.yaml` to `trie`. Review Comment: This sentence is still misleading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

