[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

GitBox Wed, 30 Jun 2021 00:06:07 -0700


dongjoon-hyun commented on a change in pull request #32753:
URL: https://github.com/apache/spark/pull/32753#discussion_r661187202




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
##########
@@ -17,13 +17,38 @@
 
 package org.apache.spark.sql.execution.datasources.parquet;
 
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.PrimitiveIterator;
+
 /**
  * Helper class to store intermediate state while reading a Parquet column 
chunk.
  */
 final class ParquetReadState {
-  /** Maximum definition level */
+  /** A special row range used when there is no row indexes (hence all rows 
must be included) */
+  private static final RowRange MAX_ROW_RANGE = new RowRange(Long.MIN_VALUE, 
Long.MAX_VALUE);
+
+  /**
+   * A special row range used when the row indexes are present AND all the row 
ranges have been
+   * processed. This serves as a sentinel at the end indicating that all rows 
come after the last
+   * row range should be skipped.
+   */
+  private static final RowRange END_ROW_RANGE = new RowRange(Long.MAX_VALUE, 
Long.MIN_VALUE);
+
+  /** Iterator over all row ranges, only not-null if column index is present */
+  private final Iterator<RowRange> rowRanges;
+
+  /** The current row range */
+  private RowRange currentRange;
+
+  /** Maximum definition level for the Parquet column */
   final int maxDefinitionLevel;
 
+  /** The current index overall all rows within the column chunk. This is used 
to check if the

Review comment:
       Maybe, `overall all` -> `over all`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32753: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

Reply via email to