TonyDoen opened a new pull request #35990: URL: https://github.com/apache/spark/pull/35990
### What changes were proposed in this pull request? This PR adds a "spark.sql.hive.ignoreCorruptRecord" to fill out the functionality that users can query successfully in dirty data(mixed schema in one table). ### Why are the changes needed? There's an existing flag "spark.sql.files.ignoreCorruptFiles" and "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads from files that have been corrupted, but it still allows the query to fail on sequence files. Being able to ignore corrupt record is useful in the scenarios that users want to query successfully in dirty data(mixed schema in one table). We would like to add a "spark.sql.hive.ignoreCorruptRecord" to fill out the functionality. ### Does this PR introduce _any_ user-facing change? Yes, add new config: "spark.sql.hive.ignoreCorruptRecord" ### How was this patch tested? Manually tested in local and existed UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
