Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21070#discussion_r186464557
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
---
@@ -63,115 +59,157 @@ public final void readBooleans(int total,
WritableColumnVector c, int rowId) {
}
}
+ private ByteBuffer getBuffer(int length) {
+ try {
+ return in.slice(length).order(ByteOrder.LITTLE_ENDIAN);
--- End diff --
No, `slice` doesn't copy. That's why we're using `ByteBuffer` now, to avoid
copy operations.
Setting the byte order to `LITTLE_ENDIAN` is correct because it is for the
buffer and Parquet buffers store values in little endian:
https://github.com/apache/parquet-format/blob/master/Encodings.md.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]