Github user squito commented on the issue:
https://github.com/apache/spark/pull/21068
I totally understand your motivation for wanting the limit. But I'm trying
to balance that against behavior which might not really achieve the desired
effect and be even more confusing in some cases.
It won't achieve the desired effect if your cluster has more nodes, but
they're all tied up in other applications. It'll be confusing to users if they
see notification about blacklisting in the logs and UI, but then still see
spark trying to use those nodes anyway. I wonder if putting this in will make
it hard
All that said, I don't have a great alternative now, other than just
removing the limit entirely for the moment and adding notification to the
driver. We could have a more general starvation detector, which wouldn't only
look at node count, but also look at delays in acquiring containers and finding
places to schedule tasks (related to SPARK-15815 & SPARK-22148), but I don't
want to tackle all of that here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]