[GitHub] [cassandra] mike-tr-adamson commented on a diff in pull request #2498: CASSANDRA-18673: Reduce disk footprint for SAI on-disk per-SSTable components

via GitHub Mon, 24 Jul 2023 09:55:28 -0700


mike-tr-adamson commented on code in PR #2498:
URL: https://github.com/apache/cassandra/pull/2498#discussion_r1272519553



##########
src/java/org/apache/cassandra/index/sai/disk/v1/sortedterms/SortedTermsWriter.java:
##########
@@ -201,14 +234,89 @@ private void copyBytes(ByteComparable source, 
BytesRefBuilder dest)
             dest.append((byte) val);
     }
 
-    /**
-     * Swaps {@link #tempTerm} with {@link #prevTerm}.
-     * It is faster to swap the pointers instead of copying the data.
-     */
-    private void swapTempWithPrevious()
+    private class TrieSegment
     {
-        BytesRefBuilder temp = this.tempTerm;
-        this.tempTerm = this.prevTerm;
-        this.prevTerm = temp;
+        private final InMemoryTrie<Long> trie;
+        private final BytesRefBuilder prevTerm = new BytesRefBuilder();
+        private final BytesRefBuilder tempTerm = new BytesRefBuilder();
+
+        private BytesRef minTerm;
+        private long totalBytesAllocated;
+        private boolean flushed = false;
+        private boolean active = true;
+
+        TrieSegment()
+        {
+            trie = new InMemoryTrie<>(TrieMemtable.BUFFER_TYPE);
+            SegmentMemoryLimiter.registerBuilder();
+        }
+
+        void add(ByteComparable term)
+        {
+            final long initialSizeOnHeap = trie.sizeOnHeap();
+
+
+            try
+            {
+                trie.putRecursive(term, rowId, (existing, update) -> update);
+            }
+            catch (InMemoryTrie.SpaceExhaustedException e)
+            {
+                throw Throwables.unchecked(e);
+            }
+
+            long bytesAllocated = trie.sizeOnHeap() - initialSizeOnHeap;
+            totalBytesAllocated += bytesAllocated;
+            SegmentMemoryLimiter.increment(bytesAllocated);
+        }
+
+        SortedTermsMeta.SortedTermsSegmentMeta flush() throws IOException
+        {
+            assert !flushed : "Cannot flush a trie segment that has already 
been flushed";
+
+            flushed = true;
+
+            long trieFilePointer;
+
+            try (IncrementalDeepTrieWriterPageAware<Long> trieWriter = new 
IncrementalDeepTrieWriterPageAware<>(trieSerializer,
+                                                                               
                                 trieOutputWriter.asSequentialWriter()))
+            {
+                Iterator<Map.Entry<ByteComparable, Long>> iterator = 
trie.entryIterator();
+
+                while (iterator.hasNext())
+                {
+                    Map.Entry<ByteComparable, Long> next = iterator.next();
+                    tempTerm.clear();
+                    copyBytes(next.getKey(), tempTerm);
+
+                    BytesRef termRef = tempTerm.get();
+
+                    if (minTerm == null)
+                        minTerm = new BytesRef(Arrays.copyOf(termRef.bytes, 
termRef.length));
+
+                    trieWriter.add(next.getKey(), next.getValue());
+                    copyBytes(next.getKey(), prevTerm);
+                }
+
+                trieFilePointer = trieWriter.complete();

Review Comment:
   Correct, or more to the point, the incremental writer assumes they are in 
lexicographic order.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [cassandra] mike-tr-adamson commented on a diff in pull request #2498: CASSANDRA-18673: Reduce disk footprint for SAI on-disk per-SSTable components

Reply via email to