sunchao commented on a change in pull request #34308:
URL: https://github.com/apache/spark/pull/34308#discussion_r732992125
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
##########
@@ -350,7 +350,8 @@ private void checkEndOfRowGroup() throws IOException {
pages.getRowIndexes().orElse(null),
convertTz,
datetimeRebaseMode,
- int96RebaseMode);
+ int96RebaseMode,
+ file == null? "null": file.toString());
Review comment:
I see, sorry missed that.
Instead of populating the `file` down to `ParquetDictionary` , I think we
can also catch `IOException` or `UnsupportedOperationException` in
`VectorizedParquetRecordReader.nextBatch` and wrap the exception with more
information such as the file and column being read. This can also help to
improve error messages for other exceptions thrown from the vectorized read
path. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]