Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835566
The default constant is actually a lowerbound to account for other
overheads (since yarn will aggressively kill tasks)... Unfortunately we
have not sized this properly : and don't have good recommendation on how to
set it.
This is compounded by magic constants in spark for various IO ops, non
deterministic network behaviour (we should be able to estimate upper bound
here = 2x number of workers), vm memory use (shuffle output is mmapp'ed
whole ... going foul with yarn virtual men limits) and so on.
Hence sizing this is, unfortunately, app specific.
On 13-Jul-2014 2:34 pm, "Sean Owen" <[email protected]> wrote:
> That makes sense, but then it doesn't explain why a constant amount works
> for a given job when executor memory is low, and then doesn't work when it
> is high. This has also been my experience and I don't have a great grasp
on
> why it would be. More threads and open files in a busy executor? It goes
> indirectly with how big you need your executor to be, but not directly.
>
> Nishkam do you have a sense of how much extra memory you had to configure
> to get it to work when executor memory increased? is it pretty marginal,
or
> quite substantial?
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/1391#issuecomment-48835447>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---