blambov commented on code in PR #2267:
URL: https://github.com/apache/cassandra/pull/2267#discussion_r1172479251


##########
src/java/org/apache/cassandra/io/tries/TrieNode.java:
##########
@@ -0,0 +1,993 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.io.tries;
+
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.io.util.SizedInts;
+
+/**
+ * Trie node types and manipulation mechanisms. The main purpose of this is to 
allow for handling tries directly as
+ * they are on disk without any serialization, and to enable the creation of 
such files.
+ * <p>
+ * The serialization methods take as argument a generic {@code 
SerializationNode} and provide a method {@code typeFor}
+ * for choosing a suitable type to represent it, which can then be used to 
calculate size and write the node.
+ * <p>
+ * To read a file containing trie nodes, one would use {@code at} to identify 
the node type and then the various
+ * read methods to retrieve the data. They all take a buffer (usually 
memory-mapped) containing the data, and a position
+ * in it that identifies the node.
+ * <p>
+ * These node types do not specify any treatment of payloads. They are only 
concerned with providing 4 bits of
+ * space for {@code payloadFlags}, and a way of calculating the position after 
the node. Users of this class by convention
+ * use non-zero payloadFlags to indicate a payload exists, write it (possibly 
in flag-dependent format) at serialization
+ * time after the node itself is written, and read it using the {@code 
payloadPosition} value.
+ * <p>
+ * To improve efficiency, multiple node types depending on the number of 
transitions are provided:
+ * -- payload only, which has no outgoing transitions
+ * -- single outgoing transition
+ * -- sparse, which provides a list of transition bytes with corresponding 
targets
+ * -- dense, where the transitions span a range of values and having the list 
(and the search in it) can be avoided
+ * <p>
+ * For each of the transition-carrying types we also have "in-page" versions 
where transition targets are the 4, 8 or 12
+ * lowest bits of the position within the same page. To save one further byte, 
the single in-page versions using 4 or 12
+ * bits cannot carry a payload.
+ * <p>
+ * This class is effectively an enumeration; abstract class permits instances 
to extend each other and reuse code.
+ */
+public abstract class TrieNode
+{
+    // Consumption (read) methods
+
+    /**
+     * Returns the type of node stored at this position. It can then be used 
to call the methods below.
+     */
+    public static TrieNode at(ByteBuffer src, int position)
+    {
+        return values[(src.get(position) >> 4) & 0xF];
+    }
+
+    /**
+     * Returns the 4 payload flag bits. Node types that cannot carry a payload 
return 0.
+     */
+    public int payloadFlags(ByteBuffer src, int position)
+    {
+        return src.get(position) & 0x0F;
+    }
+
+    /**
+     * Return the position just after the node, where the payload is usually 
stored.
+     */
+    abstract public int payloadPosition(ByteBuffer src, int position);
+
+    /**
+     * Returns search index for the given byte in the node. If exact match is 
present, this is >= 0, otherwise as in
+     * binary search.
+     */
+    abstract public int search(ByteBuffer src, int position, int 
transitionByte);       // returns as binarySearch
+
+    /**
+     * Returns the upper childIndex limit. Calling transition with values 
0...transitionRange - 1 is valid.
+     */
+    abstract public int transitionRange(ByteBuffer src, int position);
+
+    /**
+     * Returns the byte value for this child index, or Integer.MAX_VALUE if 
there are no transitions with this index or
+     * higher to permit listing the children without needing to call 
transitionRange.
+     *
+     * @param childIndex must be >= 0, though it is allowed to pass a value 
greater than {@code transitionRange - 1}
+     */
+    abstract public int transitionByte(ByteBuffer src, int position, int 
childIndex);
+
+    /**
+     * Returns the delta between the position of this node and the position of 
the target of the specified transition.
+     * This is always a negative number. Dense nodes use 0 to specify "no 
transition".
+     *
+     * @param childIndex must be >= 0 and < {@link 
#transitionRange(ByteBuffer, int)} - note that this is not validated
+     *                   and behaviour of this method is undefined for values 
outside of that range
+     */
+    abstract long transitionDelta(ByteBuffer src, int position, int 
childIndex);
+
+    /**
+     * Returns position of node to transition to for the given search index. 
Argument must be positive. May return -1
+     * if a transition with that index does not exist (DENSE nodes).
+     * Position is the offset of the node within the ByteBuffer. positionLong 
is its global placement, which is the
+     * base for any offset calculations.
+     *
+     * @param positionLong although it seems to be obvious, this argument must 
be "real", that is, each child must have
+     *                     the calculated absolute position >= 0, otherwise 
the behaviour of this method is undefined
+     * @param childIndex   must be >= 0 and < {@link 
#transitionRange(ByteBuffer, int)} - note that this is not validated
+     *                     and behaviour of this method is undefined for 
values outside of that range
+     */
+    public long transition(ByteBuffer src, int position, long positionLong, 
int childIndex)
+    {
+        // note: this is not valid for dense nodes
+        return positionLong + transitionDelta(src, position, childIndex);
+    }
+
+    /**
+     * Returns the highest transition for this node, or -1 if none exist 
(PAYLOAD_ONLY nodes).
+     */
+    public long lastTransition(ByteBuffer src, int position, long positionLong)
+    {
+        return transition(src, position, positionLong, transitionRange(src, 
position) - 1);
+    }
+
+    /**
+     * Returns a transition that is higher than the index returned by {@code 
search}. This may not exist (if the
+     * argument was higher than the last transition byte), in which case this 
returns the given {@code defaultValue}.
+     */
+    abstract public long greaterTransition(ByteBuffer src, int position, long 
positionLong, int searchIndex, long defaultValue);
+
+    /**
+     * Returns a transition that is lower than the index returned by {@code 
search}. Returns {@code defaultValue} for
+     * {@code searchIndex} equals 0 or -1 as lesser transition for those 
indexes does not exist.
+     */
+    abstract public long lesserTransition(ByteBuffer src, int position, long 
positionLong, int searchIndex, long defaultValue);
+
+    // Construction (serialization) methods
+
+    /**
+     * Returns a node type that is suitable to store the node.
+     */
+    public static TrieNode typeFor(SerializationNode<?> node, long 
nodePosition)
+    {
+        int c = node.childCount();
+        if (c == 0)
+            return PAYLOAD_ONLY;
+
+        int bitsPerPointerIndex = 0;
+        long delta = node.maxPositionDelta(nodePosition);
+        assert delta < 0;
+        while (!singles[bitsPerPointerIndex].fits(-delta))
+            ++bitsPerPointerIndex;
+
+        if (c == 1)
+        {
+            if (node.payload() != null && 
singles[bitsPerPointerIndex].bytesPerPointer == FRACTIONAL_BYTES)
+                ++bitsPerPointerIndex; // next index will permit payload
+
+            return singles[bitsPerPointerIndex];
+        }
+
+        TrieNode sparse = sparses[bitsPerPointerIndex];
+        TrieNode dense = denses[bitsPerPointerIndex];
+        return (sparse.sizeofNode(node) < dense.sizeofNode(node)) ? sparse : 
dense;
+    }
+
+    /**
+     * Returns the size needed to serialize this node.
+     */
+    abstract public int sizeofNode(SerializationNode<?> node);
+
+    /**
+     * Serializes the node. All transition target positions must already have 
been defined. {@code payloadBits} must
+     * be four bits.
+     */
+    abstract public void serialize(DataOutputPlus out, SerializationNode<?> 
node, int payloadBits, long nodePosition) throws IOException;
+
+    // Implementations
+
+    final int bytesPerPointer;
+    static final int FRACTIONAL_BYTES = 0;
+
+    TrieNode(int ordinal, int bytesPerPointer)
+    {
+        this.ordinal = ordinal;
+        this.bytesPerPointer = bytesPerPointer;
+    }
+
+    final int ordinal;
+
+    static final TrieNode PAYLOAD_ONLY = new PayloadOnly();
+
+    static private class PayloadOnly extends TrieNode
+    {
+        // byte flags
+        // var payload
+        PayloadOnly()
+        {
+            super(0, FRACTIONAL_BYTES);
+        }
+
+        @Override
+        public int payloadPosition(ByteBuffer src, int position)
+        {
+            return position + 1;
+        }
+
+        @Override
+        public int search(ByteBuffer src, int position, int transitionByte)
+        {
+            return -1;
+        }
+
+        @Override
+        public long transitionDelta(ByteBuffer src, int position, int 
childIndex)
+        {
+            return 0;
+        }
+
+        @Override
+        public long transition(ByteBuffer src, int position, long 
positionLong, int childIndex)
+        {
+            return -1;
+        }
+
+        @Override
+        public long lastTransition(ByteBuffer src, int position, long 
positionLong)
+        {
+            return -1;
+        }
+
+        @Override
+        public long greaterTransition(ByteBuffer src, int position, long 
positionLong, int searchIndex, long defaultValue)
+        {
+            return defaultValue;
+        }
+
+        @Override
+        public long lesserTransition(ByteBuffer src, int position, long 
positionLong, int searchIndex, long defaultValue)
+        {
+            return defaultValue;
+        }
+
+        @Override
+        public int transitionByte(ByteBuffer src, int position, int childIndex)
+        {
+            return Integer.MAX_VALUE;
+        }
+
+        @Override
+        public int transitionRange(ByteBuffer src, int position)
+        {
+            return 0;
+        }
+
+        public int sizeofNode(SerializationNode<?> node)
+        {
+            return 1;
+        }
+
+        @Override
+        public void serialize(DataOutputPlus dest, SerializationNode<?> node, 
int payloadBits, long nodePosition) throws IOException
+        {
+            dest.writeByte((ordinal << 4) + (payloadBits & 0x0F));
+        }
+    }
+
+    static final TrieNode SINGLE_8 = new Single(2, 1);
+    static final TrieNode SINGLE_16 = new Single(4, 2);

Review Comment:
   I gave up and moved all the static instances to an embedded `Types` class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to