Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48836879
Since this is a recurring nightmare for our users, let me try to list down
the factors which influence overhead given current spark codebase state in
the jira when I am back at my desk ... And we can add to that and model
from there (I won't be able to lead the effort though unfortunately, so
would be great if you or Sean can).
If it so happens that end of the exercise it is linear function of memory,
I am fine with it : as long as we decide based on actual data :-)
On 13-Jul-2014 3:26 pm, "Mridul Muralidharan" <[email protected]> wrote:
>
> On Jul 13, 2014 3:16 PM, "nishkamravi2" <[email protected]> wrote:
> >
> > Mridul, I think you are missing the point. We understand that this
> parameter will in a lot of cases have to be specified by the developer,
> since there is no easy way to model it (that's why we are retaining it as
a
> configurable parameter). However, the question is what would be a good
> default value be.
> >
>
> It does not help to estimate using the wrong variable.
> Any correlation which exists are incidental and app specific, as I
> elaborated before.
>
> The only actual correlation between executor memory and overhead is java
> vm overheads in managing very large heaps (and that is very high as a
> fraction). Other factors in spark have far higher impact than this.
>
> > "I would like a good default estimate of overhead ... But that is not
> > fraction of executor memory. "
> >
> > You are mistaken. It may not be a directly correlated variable, but it
> is most certainly indirectly correlated. And it is probably correlated to
> other app-specific parameters as well.
>
> Please see above.
>
> >
> > "Until the magic explanatory variable is found, which one is less
> problematic for end users -- a flat constant that frequently has to be
> tuned, or an imperfect model that could get it right in more cases?"
> >
> > This is the right point of view.
>
> Which has been our view even in previous discussions :-)
> It is unfortunate that we did not approximate this better from the start
> and went with the constant from the prototype.l impl.
>
> Note that this estimation would be very volatile to spark internals
>
> >
> > â
> > Reply to this email directly or view it on GitHub.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---