Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/18388
ok sorry I forgot you had the screenshot there. so as you mention in that
post if we are just creating to many outboundbuffers before they can actual be
sent over the network then we should try to add some flow control. did you
check to see what the buffers were for? How many connections did you have any
how many blocks was each fetching? a million is a lot either way. but I'm
assuming its something like you had 500 connections each fetching 2000 blocks.
If that is the case it seems like it would be good to add flow control here
rather then just disconnecting based on memory. really having both would be
good, this as a fall back, but the flow control part should allow everyone to
start fetching without rejecting a bunch, especially if the network can't push
it out that fast anyway.
For instance only create a handful of those outgoing buffers and wait to
get successfully sent messages back for the those before creating more. This
might be a bit more complex
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]