liupc opened a new pull request #27968: [SPARK-31202][CORE]Improve 
SizeEstimator for AppendOnlyMap
URL: https://github.com/apache/spark/pull/27968
 
 
   
   ### What changes were proposed in this pull request?
   
   Currently, spark's memory management depends on the size estimation for 
execution and storage.
   In our real cluster, users always meet the issue OOM due to the inaccurate 
size estimation for ` AppendOnlyMap`, that's because spark stores KV in an 
Array[AnyRef] in `AppendOnlyMap` for memory locality, and this value can be 
CompactBuffer[_] or Array[CompactBuffer[_]] for transformation like 
cogroup/join/groupBy, but current `SizeEstimator` will still treat this special 
array as an normal array, so in many cases, we noticed a great bias between the 
estimated size and the acutal memory consuption.
   
   In this PR, I propose to improve the estimation for `AppendOnlyMap` when the 
value type is CompactBuffer/Array[CompactBuffer].
   
   
   ### Why are the changes needed?
   Improvements and can avoid OOM for many cases.
   
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Existing UT & Added UT
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to