mike-tr-adamson opened a new pull request, #2498: URL: https://github.com/apache/cassandra/pull/2498
This PR is focused on reducing the on-disk footprint of SAI per-SSTable components. The main work involves removing the token from the primary key components. By including the token in the sorted terms and trie, we lose the advantage of prefix compression because the tokens are unique. Because of the uniqueness and the order in which the bytes are ordered (`<token>/<partition key>/<clustering key>`), we do not get any prefix compression of the partition key and clustering key. The downside of removing the token is that the trie needs to be segmented because the `<partition key>/<clustering key>` is not delivered in lexicographic order. Also, because there is no token in the trie, we need to use reverse lookup in the token long array to lookup rowIds for token only primary keys. These are used for data range skipping at the beginning of the query. This last element (token array lookup) allows us to not have to create the primary key trie at all for non-clustering table indexes because we can lookup rowIds by token. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

