[GitHub] [spark] viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and create column batch when the task is completed

GitBox Fri, 24 May 2019 03:28:55 -0700

viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and 
create column batch when the task is completed
URL: https://github.com/apache/spark/pull/24696
 
 
   ## What changes were proposed in this pull request?
   
   Cached relation decompresses and creates column batch when accessing cache. 
It's possible that a thread doesn't stop immediately reading cached relation 
after the task is completed. Due to race condition, cached relation might still 
decompresses and creates new and unnecessary batch. At the moment, the returned 
batch is also immediately closed. At the reader side, it can cause null 
exception when reading a closed batch, and we probably need to hide such 
exception.
   
   We don't need to create the batch if the task is completed. It saves the 
effort to decompress the cached batch and also prevents such exception.
   
   ## How was this patch tested?
   
   Hard to write a unit test case for this case, manually tested it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya opened a new pull request #24696: [SPARK-27832] Don't decompress and create column batch when the task is completed

Reply via email to