[GitHub] [cassandra] mike-tr-adamson opened a new pull request, #2498: Reduce disk footprint for SAI on-disk per-SSTable components

via GitHub Wed, 19 Jul 2023 07:44:12 -0700


mike-tr-adamson opened a new pull request, #2498:
URL: https://github.com/apache/cassandra/pull/2498


   This PR is focused on reducing the on-disk footprint of SAI per-SSTable 
components.
   
   The main work involves removing the token from the primary key components. 
By including the token in the sorted terms and trie, we lose the advantage of 
prefix compression because the tokens are unique. Because of the uniqueness and 
the order in which the bytes are ordered (`<token>/<partition key>/<clustering 
key>`), we do not get any prefix compression of the partition key and 
clustering key.
   
   The downside of removing the token is that the trie needs to be segmented 
because the `<partition key>/<clustering key>` is not delivered in 
lexicographic order. Also, because there is no token in the trie, we need to 
use reverse lookup in the token long array to lookup rowIds for token only 
primary keys. These are used for data range skipping at the beginning of the 
query.
   
   This last element (token array lookup) allows us to not have to create the 
primary key trie at all for non-clustering table indexes because we can lookup 
rowIds by token.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [cassandra] mike-tr-adamson opened a new pull request, #2498: Reduce disk footprint for SAI on-disk per-SSTable components

Reply via email to