viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and create column batch when the task is completed URL: https://github.com/apache/spark/pull/24696 ## What changes were proposed in this pull request? Cached relation decompresses and creates column batch when accessing cache. It's possible that a thread doesn't stop immediately reading cached relation after the task is completed. Due to race condition, cached relation might still decompresses and creates new and unnecessary batch. At the moment, the returned batch is also immediately closed. At the reader side, it can cause null exception when reading a closed batch, and we probably need to hide such exception. We don't need to create the batch if the task is completed. It saves the effort to decompress the cached batch and also prevents such exception. ## How was this patch tested? Hard to write a unit test case for this case, manually tested it.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
