Re: [PR] [SPARK-56895][SQL] Batch ByteBuffer slice in RLE PACKED decode to reduce allocation overhead [spark]

via GitHub Tue, 23 Jun 2026 02:22:39 -0700


iemejia commented on code in PR #55922:
URL: https://github.com/apache/spark/pull/55922#discussion_r3458525402



##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java:
##########
@@ -1000,11 +1000,15 @@ private boolean readNextGroup() {
             this.currentBuffer = new int[this.currentCount];
           }
           currentBufferIdx = 0;
+          // Slice all packed bytes in one call (numGroups groups x bitWidth 
bytes each)
+          // instead of one slice per group, avoiding per-group ByteBuffer 
allocation.
+          int totalBytes = numGroups * bitWidth;
+          ByteBuffer packed = in.slice(totalBytes);

Review Comment:
   Good point. Added a test that feeds a `MultiBufferInputStream` with the 
packed run spanning a buffer boundary (encoded bytes split into 3-byte chunks), 
so the base-0 `pos` path is now covered directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56895][SQL] Batch ByteBuffer slice in RLE PACKED decode to reduce allocation overhead [spark]

Reply via email to