Github user ehnalis commented on the pull request:
https://github.com/apache/spark/pull/6082#issuecomment-103197630
I think so too, putting this in makes no harm, but can fasten the start-up
of Spark jobs on not-so-contested clusters. I think we have two questions here:
1) How much improvement do I get when there are enough resources when my AM
wakes up?
This is very simple to answer and there's no need for evaluation. Sleeping
for 200ms is 4800ms better than 5000ms. Any evaluation would be incorrect if it
does not reports 4800ms difference.
2) How much additional stress do we introduce to the RM by eagerly
heartbeating against pending allocations on a loaded cluster?
Depends on the characteristics of the cluster. I guess administrators would
like to tailor and refine the eager allocation interval based on average load
of RM. It's still okay to set it to 2.5 s, so it will jump into 5 s on the
first "allocation-miss".
So everything can be a reality, but that is why we have these intervals
configurable.
Trivially, the dumb version will not adapt to the cluster's load, therefore
it would need to be updated manually when load changes on RM.
@tgravescs You think that 3 seconds of heartbeat would not be much load on
the RM, but on some cluster it would! In same cases you might would like to HB
on each 10th second, but eagerly HB with 5-second intervals! It depends on your
cluster, but at least you have some adaptive mechanism, which will fasten
start-up of Spark jobs, but will also consider the load of the RM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]