LuciferYang commented on code in PR #55919:
URL: https://github.com/apache/spark/pull/55919#discussion_r3308713234
##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedReaderBase.java:
##########
@@ -27,6 +30,44 @@
*/
public class VectorizedReaderBase extends ValuesReader implements
VectorizedValuesReader {
+ /**
+ * Encodes an unsigned long as a minimal big-endian two's-complement byte
array
+ * compatible with {@link java.math.BigInteger} encoding. The result is
written into
+ * the backing array of {@code buf} (which must have capacity >= 9). Returns
the
+ * start offset; the valid bytes are {@code buf.array()[start .. 8]} (length
= 9 - start).
+ *
+ * <p>This avoids the per-value overhead of
+ * {@code new BigInteger(Long.toUnsignedString(v)).toByteArray()} which
allocates a
+ * String, a BigInteger, and a byte[] on every call.
+ */
+ static int encodeUnsignedLongBigEndian(long v, ByteBuffer buf) {
+ byte[] scratch = buf.array();
+ // ByteBuffer is big-endian by default; writes 8 bytes MSB-first at offset
1.
+ // Always write before the zero-check so that the buffer is current even
when reused.
+ buf.putLong(1, v);
+ if (v == 0L) {
+ scratch[0] = 0;
+ return 0;
Review Comment:
Return `start = 8` instead of `start = 0` so the caller emits the single
`0x00` byte at `scratch[8]` (already written by `putLong`):
```java
buf.putLong(1, v);
if (v == 0L) {
return 8; // scratch[8] is already 0x00; caller writes 9 - 8 = 1 byte:
[0x00]
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]