[GitHub] [spark] cloud-fan commented on a change in pull request #28406: [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase

GitBox Thu, 30 Apr 2020 21:42:13 -0700


cloud-fan commented on a change in pull request #28406:
URL: https://github.com/apache/spark/pull/28406#discussion_r418414012




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##########
@@ -342,6 +380,43 @@ public void readLongs(
     }
   }
 
+  // A fork of `readLongs`, which rebases the timestamp long value 
(microseconds) before filling
+  // the Spark column vector.
+  public void readLongsWithRebase(
+      int total,
+      WritableColumnVector c,
+      int rowId,
+      int level,
+      VectorizedValuesReader data) throws IOException {
+    int left = total;
+    while (left > 0) {
+      if (this.currentCount == 0) this.readNextGroup();
+      int n = Math.min(left, this.currentCount);
+      switch (mode) {
+        case RLE:
+          if (currentValue == level) {
+            data.readLongsWithRebase(n, c, rowId);
+          } else {
+            c.putNulls(rowId, n);
+          }
+          break;
+        case PACKED:

Review comment:
       I didn't optimize this case because the no-rebase code path looks not 
very fast. It has a `if-else` in the loop.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #28406: [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase

Reply via email to