[GitHub] [spark] sunchao commented on a change in pull request #34308: [SPARK-37035][SQL] Improve error message when use parquet vectorize reader

GitBox Wed, 20 Oct 2021 21:03:35 -0700


sunchao commented on a change in pull request #34308:
URL: https://github.com/apache/spark/pull/34308#discussion_r733302882




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
##########
@@ -350,7 +350,8 @@ private void checkEndOfRowGroup() throws IOException {
         pages.getRowIndexes().orElse(null),
         convertTz,
         datetimeRebaseMode,
-        int96RebaseMode);
+        int96RebaseMode,
+        file == null? "null": file.toString());

Review comment:
       oops you are right, this is lazy reading path with Parquet dictionary. 
   
   I don't have strong opinion on this but think it might be a bit better to 
avoid passing "null" here. Maybe we can require `VectorizedParquetRecordReader` 
to always have the `file` initialized. For this we'll need to modify 
`initialize` method and the other test case to pass a dummy path string.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #34308: [SPARK-37035][SQL] Improve error message when use parquet vectorize reader

Reply via email to