Re: [PR] CASSANDRA-20772 Accord Route Index does not filter based off txn_id range which causes segments which can't contain the data to be queried [cassandra]

via GitHub Wed, 16 Jul 2025 11:23:51 -0700


dcapwell commented on code in PR #4257:
URL: https://github.com/apache/cassandra/pull/4257#discussion_r2211246184



##########
src/java/org/apache/cassandra/index/accord/RangeMemoryIndex.java:
##########
@@ -58,13 +63,51 @@ public class RangeMemoryIndex
 {
 
     @GuardedBy("this")
-    private final Map<Group, RangeTree<byte[], Range, DecoratedKey>> map = new 
HashMap<>();
-    @GuardedBy("this")
-    private final Map<Group, Metadata> groupMetadata = new HashMap<>();
+    private final Map<Key, Group> map = new HashMap<>();
 
-    private static class Metadata
+    private static class Group
     {
+        private RangeTree<byte[], Range, DecoratedKey> tree = 
createRangeTree();
         public byte[] minTerm, maxTerm;
+        public TxnId minTimestamp = TxnId.MAX;
+        public TxnId maxTimestamp = TxnId.NONE;
+
+        void add(Range range, DecoratedKey key, TxnId txnId, byte[] start, 
byte[] end)
+        {
+            tree.add(range, key);
+            minTerm = minTerm == null ? start : 
ByteArrayUtil.compareUnsigned(minTerm, 0, start, 0, minTerm.length) > 0 ? start 
: minTerm;
+            maxTerm = maxTerm == null ? end : 
ByteArrayUtil.compareUnsigned(maxTerm, 0, end, 0, maxTerm.length) < 0 ? end : 
maxTerm;
+            if (minTimestamp.compareTo(txnId) > 0)
+                minTimestamp = txnId;
+            if (maxTimestamp.compareTo(txnId) < 0)
+                maxTimestamp = txnId;
+        }
+
+        void search(byte[] start, byte[] end,
+                    Timestamp minTimestamp, Timestamp maxTimestamp,
+                    Consumer<Map.Entry<RangeMemoryIndex.Range, DecoratedKey>> 
fn)
+        {
+            if (this.minTimestamp.compareTo(maxTimestamp) > 0 || 
this.maxTimestamp.compareTo(minTimestamp) < 0)
+                return;
+            tree.search(new Range(start, end), e -> {
+                TxnId id = 
AccordKeyspace.JournalColumns.getJournalKey(e.getValue()).id;

Review Comment:
   not a fan of this cost, but my thinking is as follows
   
   1) the in-memory cost is the `ByteBuffer`
   2) only called when the range intersects, so each call are limited to 
possible matches, so just need to do the timestamp filter.
   
   I don't know if min/max are no-op in the common case (i doubt it as it 
should be based off durability as far as i can tell), so don't think it really 
makes sense to try to optimize the filter to avoid this cost when min/max are 
the full range



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Re: [PR] CASSANDRA-20772 Accord Route Index does not filter based off txn_id range which causes segments which can't contain the data to be queried [cassandra]

Reply via email to