Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48835566
  
    The default constant is actually a lowerbound to account for other
    overheads (since yarn will aggressively kill tasks)... Unfortunately we
    have not sized this properly : and don't have good recommendation on how to
    set it.
    
    This is compounded by magic constants in spark for various IO ops, non
    deterministic network behaviour (we should be able to estimate upper bound
    here = 2x number of workers), vm memory use (shuffle output is mmapp'ed
    whole ... going foul with yarn virtual men limits) and so on.
    
    Hence sizing this is, unfortunately, app specific.
     On 13-Jul-2014 2:34 pm, "Sean Owen" <[email protected]> wrote:
    
    > That makes sense, but then it doesn't explain why a constant amount works
    > for a given job when executor memory is low, and then doesn't work when it
    > is high. This has also been my experience and I don't have a great grasp 
on
    > why it would be. More threads and open files in a busy executor? It goes
    > indirectly with how big you need your executor to be, but not directly.
    >
    > Nishkam do you have a sense of how much extra memory you had to configure
    > to get it to work when executor memory increased? is it pretty marginal, 
or
    > quite substantial?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835447>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to