[GitHub] [spark] sos3k commented on pull request #37413: [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

GitBox Fri, 02 Sep 2022 05:07:25 -0700


sos3k commented on PR #37413:
URL: https://github.com/apache/spark/pull/37413#issuecomment-1235422141


   <img width="490" alt="Screenshot 2022-09-02 at 13 57 44" 
src="https://user-images.githubusercontent.com/2758082/188138030-e74fc0d0-4220-4ab1-a665-de907158c29a.png";>
   Hi everyone, I am Radek from HuuugeGames, we use Databricks in a version of 
runtime 10.4 LTS and I wanted to just let you know that after including your 
changes to the runtime (Databricks did that at 26.08 during their maintenance) 
we found our job started to behave inconsistently as from time to time we are 
pruning all of the source files during the scanning from s3 with using dynamic 
file pruning. I attached the screenshot that shows lost of broadcasted data 
during the DFP (1 records read from Reuse Exchange, where normally there should 
be 97) which results no records read from the S3. We are making join between 
small dim and events table and definitely something is happening here. After 
disable of DFP the plan has changed and the process looks stable. We also got 
back to the previous version of the Databricks runtime image without this 
changes and also process looks good even when DFP is enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sos3k commented on pull request #37413: [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

Reply via email to