Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/12571#issuecomment-215521675
I understand the argument of we want the best user experience and I'm not
against the settings themselves, I just think the benefit isn't worth the cost
here.
These are very specific advanced java options and properly maintaining and
parsing them to me is not a necessary thing. For instance when java 9,10,11
come out and the options no longer exist or change we have to go change code,
if ibm java comes out with different config we have to change, if someone
thinks 80% is better then 90% we have to change. We already have enough PRs.
Let the user/admins configure it for their version of java and specific
needs. We are adding a bunch of code to parse these and set them to a default
that someone thinks is better. Many others might disagree. For instance with
MapReduce we run it at 50% to fail fast. Why not set spark to that? if we
want it to fail fast 50% is better then 90, right? Why don't we set the garbage
collector as well? To me this all comes down to configuring what is best for
your specific application. Since Spark can do so many different things -
streaming, ML, graph processing, ETL, having one default isn't necessarily best
for all.
I think putting this in sets a bad precedence and just adds maintenance
headache for not much benefit. @vanzin mentions he has never seen anyone set
this, so is it that big of a deal? Where is the data that says 90% is better
then 98% for the majority of Spark users. Obviously if things just don't run
like you mention with the max perm size, that makes it a much easier call and
it makes sense to put it in, but I don't see that here.
Many of my customers don't set it and things are fine. I see other users
set it because they explicitly want to fail very fast and its less then 90%.
I also think setting XX:GCHeapFreeLimit is more risky then setting
GCTimeLimit. I personally have never seen anyone actually set this. its
defined as "The lower limit on the amount of space freed during a garbage
collection in percent of the maximum heap (default is 2)" This to me is much
more application specific then the GC time limit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]