Hi,

This could be interesting for anyone interested in RM / Resource Types:
I filed a jira recently: https://issues.apache.org/jira/browse/YARN-9421
(Implement SafeMode for ResourceManager by defining a resource threshold).

The issue in one sentence: If an app is submitted while RM still haven't
received all registration requests from NMs and if the demand of the app
contains any custom resource (e.g. GPU), it can happen that the app will be
rejected quickly with a InvalidResourceRequestException.
Later on, the same app submitted later could be accepted, if the NMs are
registered (most likely couple of seconds later). In this sense, the
behavior of RM is not consistent.

Please read through the jira, I think the issue is well described there!

Thanks a lot,
Szilard

Reply via email to