[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

rdblue Mon, 07 May 2018 08:49:27 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21070#discussion_r186464557
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
    @@ -63,115 +59,157 @@ public final void readBooleans(int total, 
WritableColumnVector c, int rowId) {
         }
       }
     
    +  private ByteBuffer getBuffer(int length) {
    +    try {
    +      return in.slice(length).order(ByteOrder.LITTLE_ENDIAN);
    --- End diff --
    
    No, `slice` doesn't copy. That's why we're using `ByteBuffer` now, to avoid 
copy operations.
    
    Setting the byte order to `LITTLE_ENDIAN` is correct because it is for the 
buffer and Parquet buffers store values in little endian: 
https://github.com/apache/parquet-format/blob/master/Encodings.md.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10....

Reply via email to