[GitHub] [spark] cloud-fan commented on a change in pull request #28406: [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase

GitBox Thu, 30 Apr 2020 21:43:13 -0700


cloud-fan commented on a change in pull request #28406:
URL: https://github.com/apache/spark/pull/28406#discussion_r418414225




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##########
@@ -342,6 +380,43 @@ public void readLongs(
     }
   }
 
+  // A fork of `readLongs`, which rebases the timestamp long value 
(microseconds) before filling
+  // the Spark column vector.
+  public void readLongsWithRebase(
+      int total,
+      WritableColumnVector c,
+      int rowId,
+      int level,
+      VectorizedValuesReader data) throws IOException {
+    int left = total;
+    while (left > 0) {
+      if (this.currentCount == 0) this.readNextGroup();
+      int n = Math.min(left, this.currentCount);
+      switch (mode) {
+        case RLE:
+          if (currentValue == level) {
+            data.readLongsWithRebase(n, c, rowId);
+          } else {
+            c.putNulls(rowId, n);
+          }
+          break;
+        case PACKED:

Review comment:
       The general idea is to add an extra loop to check if we need to rebase 
or not, and it's only worthwhile if the no-rebase code path is much faster than 
the rebase code path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #28406: [SPARK-31606][SQL] Reduce the perf regression of vectorized parquet reader caused by datetime rebase

Reply via email to