Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19233
If you know you won't need more than 10, then set the max to 10.
If you don't necessarily know that, then I think you're complaining that
dynamic allocation doesn't 'know' how many executors will be needed in advance.
Yes, in general the load goes up and down and can't be predicted, so dynamic
allocation is always adapting, and will add executors or time out idle ones
eventually to match load. This is just how it works.
I think you're suggesting a specific strategy for Spark Streaming jobs
only. While I understand it, because you do know more about the load in this
type of job, this is also a reason to set the max because you know what it
should be, or simply not use Spark Streaming. It's often not used in streaming
because the lag of adapting to a new load of tasks increases latency and
variability.
Just set your max to 10, or perhaps set it to more rapidly time out idle
executors.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]