IvanVergiliev opened a new pull request #23766: [SPARK-26859][SQL] Fix data correctness bug in ORC deserializer URL: https://github.com/apache/spark/pull/23766 ## What changes were proposed in this pull request? There is a bug in `OrcDeserializer.scala` that results in `null`s being set at the wrong column position, and for state from previous records to remain uncleared in next records. There are more details for when exactly the bug gets triggered and what the outcome is in the [JIRA issue](https://jira.apache.org/jira/browse/SPARK-26859). The high-level summary is that this bug results in severe data correctness issues, but fortunately the set of conditions to expose the bug are complicated and make the surface area somewhat small. This change fixes the problem and adds a respective test. ## How was this patch tested? The change contains a test that fails on `master` and succeeds with the current fix. The test is at the same level of abstraction as existing `OrcSourceSuite` tests. I considered adding unit tests that test the `OrcDeserializer` class directly, but none existed and it didn't seem like a frequent pattern across the parts of the codebase I've seen recently so I decided against doing it - open to reconsidering that decision.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
