Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/2746#issuecomment-60040691
Hey Andrew - I looked at this and I had some small suggestions around
naming.
However, there is a big open question here relating to the semantics of
requesting more resources from things like YARN and standalone (/cc
@kayousterhout @sryza @vanzin) and we need to define that API precisely before
we can know what to do here.
The code here uses timeouts in order to prevent requesting too many
executors. But this doesn't seem correct to me. What happens if this code
"times out" a request after some time, but actually the YARN scheduler still
has that request and plans to fullfil it later once resources are available. In
that case, you could over allocate because you might send another request later
and YARN will actually fulfill both requests.
There are different levels of robustness we could expect from the scheduler
API's @sryza @vanzin might have some insight for YARN:
1. _Guaranteed instant fulfillment_ If I request N executors I will get
them instantly.
2. _Guaranteed eventual fullfilment_ If I request N executors I will get
them eventually, provided that enough resources become available in my YARN
queue.
3. _Best effort with acknowledgment_ If I request N executors, the request
might not be fulfilled, even if sufficient resources eventually become
available in my YARN queue. However, there is a way to know when requests are
"dropped" - i.e. when they will no longer be considered for fulfillment.
4. _Best effort_ If I request N executors, the request might not be
fulfilled, even if sufficient resources eventually become available in my YARN
queue. YARN provides no way to track whether a given resource request is going
to be fulfilled or not.
(1) is impossible to fulfill since someone could e.g. request more
resources than the question.
I'm actually not sure whether YARN offers (2) (3) or (4). It would be good
to know that. The only case where I think it makes sense to have our own
timeouts is (4) since there is really nothing we can do. If YARN provides (2),
then we should just assume that if requests are pending, there is nothing more
we can do - that would be the simplest. If YARN supports (3) it's a bit
trickier.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]