[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34337: [SPARK-37066][SQL] Improve error message to show file path when OrcColumnarBatchReader throw ArrayIndexOutofBoundsException

GitBox Wed, 20 Oct 2021 20:29:49 -0700


AngersZhuuuu commented on a change in pull request #34337:
URL: https://github.com/apache/spark/pull/34337#discussion_r733291761




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala
##########
@@ -73,6 +73,10 @@ class FilePartitionReader[T](readers: 
Iterator[PartitionedFileReader[T]])
           throw QueryExecutionErrors.cannotReadParquetFilesError(e)
         }
         throw e
+      case e: ArrayIndexOutOfBoundsException =>

Review comment:
       > We should consider `ignoreCorrupteFiles` configuration, too. If 
`ignoreCorrupteFiles`, this should log only instead of `re-throwing`.
   
   I think this should have been handled by  
   
https://github.com/apache/spark/blob/895abefb955e9c9ee1a0f9d5a0261ef301cbd3aa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala#L51-L55
   
   
   when I set `set spark.sql.files.ignoreCorruptFiles=true`, the query result 
count is 0, so we don't need to handle this here again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34337: [SPARK-37066][SQL] Improve error message to show file path when OrcColumnarBatchReader throw ArrayIndexOutofBoundsException

Reply via email to