[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

GitBox Thu, 13 Oct 2022 11:37:09 -0700


mridulm commented on PR #38064:
URL: https://github.com/apache/spark/pull/38064#issuecomment-1278026577


   @liuzqt Most task results are very small.
   We will now be over-provisioning that by a few orders of magnitude when 
moving to `ChunkedByteBufferOutputStream` - while a vanishingly small set of 
cases hit the large buffer case.
   This can potentially have an impact on memory utilization at executor, and 
if possible look at ways to mitigate - particularly, for example, when we have 
a good estimate of the output size.
   
   This is not to say I have serious concerns (we do use 
`ChunkedByteBufferOutputStream` with precisely that size everywhere else !) - 
but it is not without tradeoff.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

Reply via email to