Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/1391#issuecomment-48836123
  
    You are missing my point I think ... To give unscientific anecdotal example
    : our gbdt expiriments , which run on about 22 nodes need no tuning.
    While our collaborative filtering expiriments, running on 300 nodes require
    much higher overhead.
    But QR factorization on the same 300 nodes need much lower overhead.
    The values are all over the place and very app specific.
    
    In an effort to ensure jobs always run to completion, setting overhead to
    high fraction of executor memory might ensure successful completion but at
    high performance loss and substandard scaling.
    
    I would like a good default estimate of overhead ... But that is not
    fraction of executor memory.
    Instead of trying to model the overhead using executor memory, better would
    be to look at actual parameters which influence it (as in, look at code and
    figure it out; followed by validation and tuning of course) and use that as
    estimate.
     On 13-Jul-2014 2:58 pm, "nishkamravi2" <[email protected]> wrote:
    
    > That's why the parameter is configurable. If you have jobs that cause
    > 20-25% memory_overhead, default values will not help.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/1391#issuecomment-48835881>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to