[GitHub] [spark] sunchao commented on a change in pull request #34308: [SPARK-37035][SQL] Improve error message when use parquet vectorize reader

GitBox Wed, 20 Oct 2021 10:20:33 -0700


sunchao commented on a change in pull request #34308:
URL: https://github.com/apache/spark/pull/34308#discussion_r732992125




##########
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
##########
@@ -350,7 +350,8 @@ private void checkEndOfRowGroup() throws IOException {
         pages.getRowIndexes().orElse(null),
         convertTz,
         datetimeRebaseMode,
-        int96RebaseMode);
+        int96RebaseMode,
+        file == null? "null": file.toString());

Review comment:
       I see, sorry missed that.
   
   Instead of populating the `file` down to `ParquetDictionary` , I think we 
can also catch `IOException` or `UnsupportedOperationException` in 
`VectorizedParquetRecordReader.nextBatch` and wrap the exception with more 
information such as the file and column being read. This can also help to 
improve error messages for other exceptions thrown from the vectorized read 
path. What do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #34308: [SPARK-37035][SQL] Improve error message when use parquet vectorize reader

Reply via email to