maedhroz commented on code in PR #2267:
URL: https://github.com/apache/cassandra/pull/2267#discussion_r1169272356


##########
src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.md:
##########
@@ -0,0 +1,991 @@
+Big Trie-Indexed (BTI) SSTable format
+-------------------------------------
+
+This document describes the BTI SSTable format, which is introduced to
+Cassandra with 
[CEP-25](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-25%3A+Trie-indexed+SSTable+format).
+
+The format is called BTI, which stands for "Big Trie-Indexed", because it 
shares
+the data format of the existing BIG format and only changes the primary indexes
+inside SSTables. The format relies on byte order and tries, and uses a 
combination
+of features to make the indexing structures compact and efficient. The 
paragraphs
+below describe the format's features and mechanisms together with the 
motivation 
+behind them, and conclude with detailed description of the on-disk format.
+
+# Prerequisites
+
+## Byte-comparable types
+
+The property of being byte-comparable (also called byte-ordered) for a
+key denotes that there is a serialisation of that key to a sequence of
+bytes where the lexicographic comparison of the unsigned bytes produces
+the same result as performing a typed comparison of the key.
+
+For Cassandra, such a representation is given by
+[CASSANDRA-6936](https://issues.apache.org/jira/browse/CASSANDRA-6936).
+Detailed description of the mechanics of the translation are provided in
+the [included 
documentation](../../../../utils/bytecomparable/ByteComparable.md).
+
+## Tries
+
+A trie is a data structure that describes a mapping between sequences
+and associated values. It is very similar to a deterministic finite
+state automaton, with the main difference that an automaton is allowed
+to have cycles, while a trie is not.
+
+Because the theory and main usage of the structure is for encoding words
+of a language, the trie terminology talks about "characters", "words"
+and "alphabet", which in our case map to bytes of the byte-ordered
+representation, the sequence that encodes it, and the possible values of
+a byte[^1].
+
+[^1]: For simplicity this description assumes we directly map a byte to
+a character. Other options are also possible (e.g. using hex digits
+as the alphabet and two transitions per byte).
+
+A trie can be defined as a tree graph in which vertices are states, some
+of which can be final and contain associated information, and where
+edges are labelled with characters. A valid word in the trie is encoded
+by a path starting from the root of the trie where each edge is labelled
+with the next character of the word, and ending in a final state which
+contains the 'payload' associated with the word.
+
+```mermaid
+graph TD
+  Node_(( ))
+  style Node_ fill:darkgrey
+  Node_ --"a"--> Node_a(((a)))
+    Node_a --"l"--> Node_al(( ))
+      Node_al --"l"--> Node_all(( ))
+        Node_all --"o"--> Node_allo(( ))
+          Node_allo --"w"--> Node_allow(((allow)))
+    Node_a --"n"--> Node_an(((an)))
+      Node_an --"d"--> Node_and(((and)))
+      Node_an --"y"--> Node_any(((any)))
+    Node_a --"r"--> Node_ar(( ))
+      Node_ar --"e"--> Node_are(((are)))
+    Node_a --"s"--> Node_as(((as)))
+  Node_ --"n"--> Node_n(( ))
+    Node_n --"o"--> Node_no(( ))
+      Node_no --"d"--> Node_nod(( ))
+        Node_nod --"e"--> Node_node(((node)))
+  Node_ --"o"--> Node_o(( ))
+    Node_o --"f"--> Node_of(((of)))
+    Node_o --"n"--> Node_on(((on)))
+  Node_ --"t"--> Node_t(( ))
+    Node_t --"h"--> Node_th(( ))
+      Node_th --"e"--> Node_the(((the)))
+      Node_th --"i"--> Node_thi(( ))
+        Node_thi --"s"--> Node_this(((this)))
+    Node_t --"o"--> Node_to(((to)))
+    Node_t --"r"--> Node_tr(( ))
+      Node_tr --"i"--> Node_tri(( ))
+        Node_tri --"e"--> Node_trie(((trie)))
+    Node_t --"y"--> Node_ty(( ))
+      Node_ty --"p"--> Node_typ(( ))
+        Node_typ --"e"--> Node_type(( ))
+          Node_type --"s"--> Node_types(((types)))
+  Node_ --"w"--> Node_w(( ))
+    Node_w --"i"--> Node_wi(( ))
+      Node_wi --"t"--> Node_wit(( ))
+        Node_wit --"h"--> Node_with(((with)))
+          Node_with --"o"--> Node_witho(( ))
+            Node_witho --"u"--> Node_withou(( ))
+              Node_withou --"t"--> Node_without(((without)))
+```
+
+This means that in a constructed trie finding the payload associated
+with a word is a matter of following the edges (also called
+"transitions") from the initial state labelled with the consecutive
+characters of the word, and retrieving the payload associated with the
+state at which we end up. If that's not a final state, or if at any
+point in this we did not find a transition in the trie matching the
+character, the trie does not have an association for the word. The
+complexity of lookup is thus _O_(len(word)) transitions, where the cost of
+taking a transition is usually constant, thus this complexity is
+theoretically optimal.
+
+From a storage space perspective, one of the main benefits of a trie as
+a data structure for storing a map is the fact that it completely avoids
+storing redundant prefixes. All words that start with the same sequence
+store a representation of that sequence only once. If prefixes are
+commonly shared, this can save a great deal of space.
+
+When the items stored in a trie are lexicographically (=byte) ordered, a
+trie is also an ordered structure. A trie can be walked in order and it
+is also possible to efficiently list the items between two given keys.
+
+In fact, one can efficiently (and lazily) apply set algebra over tries,
+and slicing can be seen as a simple application of intersection, where
+the intersecting trie is generated on the fly. The set operations
+benefit from the same prefix-sharing effect — we apply union /
+intersection / difference to a state, which has the effect of applying
+the operation to all words that share the prefix denoted by that state.
+
+```mermaid
+graph TD
+  Node_(( ))
+  style Node_ fill:darkgrey
+  Node_ --"a"--> Node_a(((a)))
+  style Node_a stroke:lightgrey,color:lightgrey
+  linkStyle 0 stroke:lightgrey,color:lightgrey
+    Node_a --"l"--> Node_al(( ))
+    style Node_al stroke:lightgrey,color:lightgrey
+    linkStyle 1 stroke:lightgrey,color:lightgrey
+      Node_al --"l"--> Node_all(( ))
+      style Node_all stroke:lightgrey,color:lightgrey
+      linkStyle 2 stroke:lightgrey,color:lightgrey
+        Node_all --"o"--> Node_allo(( ))
+        style Node_allo stroke:lightgrey,color:lightgrey
+        linkStyle 3 stroke:lightgrey,color:lightgrey
+          Node_allo --"w"--> Node_allow(((allow)))
+          style Node_allow stroke:lightgrey,color:lightgrey
+          linkStyle 4 stroke:lightgrey,color:lightgrey
+    Node_a --"n"--> Node_an(((an)))
+          style Node_an stroke:lightgrey,color:lightgrey
+          linkStyle 5 stroke:lightgrey,color:lightgrey
+      Node_an --"d"--> Node_and(((and)))
+          style Node_and stroke:lightgrey,color:lightgrey
+          linkStyle 6 stroke:lightgrey,color:lightgrey
+      Node_an --"y"--> Node_any(((any)))
+          style Node_any stroke:lightgrey,color:lightgrey
+          linkStyle 7 stroke:lightgrey,color:lightgrey
+    Node_a --"r"--> Node_ar(( ))
+          style Node_ar stroke:lightgrey,color:lightgrey
+          linkStyle 8 stroke:lightgrey,color:lightgrey
+      Node_ar --"e"--> Node_are(((are)))
+          style Node_are stroke:lightgrey,color:lightgrey
+          linkStyle 9 stroke:lightgrey,color:lightgrey
+    Node_a --"s"--> Node_as(((as)))
+          style Node_as stroke:lightgrey,color:lightgrey
+          linkStyle 10 stroke:lightgrey,color:lightgrey
+  Node_ --"n"--> Node_n(( ))
+    Node_n --"o"--> Node_no(( ))
+      Node_no --"d"--> Node_nod(( ))
+        Node_nod --"e"--> Node_node(((node)))
+  Node_ --"o"--> Node_o(( ))
+    Node_o --"f"--> Node_of(((of)))
+    Node_o --"n"--> Node_on(((on)))
+  Node_ --"t"--> Node_t(( ))
+    Node_t --"h"--> Node_th(( ))
+      Node_th --"e"--> Node_the(((the)))
+      Node_th --"i"--> Node_thi(( ))
+        Node_thi --"s"--> Node_this(((this)))
+          style Node_this stroke:lightgrey,color:lightgrey
+          linkStyle 22 stroke:lightgrey,color:lightgrey
+    Node_t --"o"--> Node_to(((to)))
+          style Node_to stroke:lightgrey,color:lightgrey
+          linkStyle 23 stroke:lightgrey,color:lightgrey
+    Node_t --"r"--> Node_tr(( ))
+          style Node_tr stroke:lightgrey,color:lightgrey
+          linkStyle 24 stroke:lightgrey,color:lightgrey
+      Node_tr --"i"--> Node_tri(( ))
+          style Node_tri stroke:lightgrey,color:lightgrey
+          linkStyle 25 stroke:lightgrey,color:lightgrey
+        Node_tri --"e"--> Node_trie(((trie)))
+          style Node_trie stroke:lightgrey,color:lightgrey
+          linkStyle 26 stroke:lightgrey,color:lightgrey
+    Node_t --"y"--> Node_ty(( ))
+          style Node_ty stroke:lightgrey,color:lightgrey
+          linkStyle 27 stroke:lightgrey,color:lightgrey
+      Node_ty --"p"--> Node_typ(( ))
+          style Node_typ stroke:lightgrey,color:lightgrey
+          linkStyle 28 stroke:lightgrey,color:lightgrey
+        Node_typ --"e"--> Node_type(( ))
+          style Node_type stroke:lightgrey,color:lightgrey
+          linkStyle 29 stroke:lightgrey,color:lightgrey
+          Node_type --"s"--> Node_types(((types)))
+          style Node_types stroke:lightgrey,color:lightgrey
+          linkStyle 30 stroke:lightgrey,color:lightgrey
+  Node_ --"w"--> Node_w(( ))
+          style Node_w stroke:lightgrey,color:lightgrey
+          linkStyle 31 stroke:lightgrey,color:lightgrey
+    Node_w --"i"--> Node_wi(( ))
+          style Node_wi stroke:lightgrey,color:lightgrey
+          linkStyle 32 stroke:lightgrey,color:lightgrey
+      Node_wi --"t"--> Node_wit(( ))
+          style Node_wit stroke:lightgrey,color:lightgrey
+          linkStyle 33 stroke:lightgrey,color:lightgrey
+        Node_wit --"h"--> Node_with(((with)))
+          style Node_with stroke:lightgrey,color:lightgrey
+          linkStyle 34 stroke:lightgrey,color:lightgrey
+          Node_with --"o"--> Node_witho(( ))
+          style Node_witho stroke:lightgrey,color:lightgrey
+          linkStyle 35 stroke:lightgrey,color:lightgrey
+            Node_witho --"u"--> Node_withou(( ))
+          style Node_withou stroke:lightgrey,color:lightgrey
+          linkStyle 36 stroke:lightgrey,color:lightgrey
+              Node_withou --"t"--> Node_without(((without)))
+          style Node_without stroke:lightgrey,color:lightgrey
+          linkStyle 37 stroke:lightgrey,color:lightgrey
+
+```
+
+(An example of slicing the trie above with the range "bit"-"thing".
+Processing only applies on boundary nodes (root, "t", "th", "thi"),
+where we throw away the transitions outside the range. Subtries like the
+ones for "n" and "o" fall completely between "b" and "t" thus are fully
+inside the range and can be processed without any further restrictions.)
+
+A trie can be used as a modifiable in-memory data structure where one
+can add and remove individual elements. It can also be constructed from
+sorted input, incrementally storing the data directly to disk and
+building an efficient read-only on-disk data structure.
+
+For more formal information about the concept and applications of tries
+and finite state automata, try [Introduction to Automata Theory,
+Languages, and Computation](https://books.google.bg/books?id=dVipBwAAQBAJ).

Review Comment:
   Although all the links to external bookstores appear to be broken :D



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to