Github user squito commented on the issue:
https://github.com/apache/spark/pull/21068
I mean when `YarnAllocatorBlacklistTracker` decides to blacklist because of
allocation failures, it doesn't send any message back to the driver -- so the
driver doesn't have a msg in the logs, nor in the event log nor a UI update.
So in client mode, the user would need to get AM logs to know what was going on.
Attila wanted to do it this way because of
`mostRelevantSubsetOfBlacklistedNodes` -- it seemed weird to send an update to
the driver when the blacklisting wasn't necessarily even in effect. Though now
that I'm thinking about this, maybe it should just send the update anyway, even
though that blacklist may effectively be ignored.
Re: starvation -- I agree, though "eventually" for resources can be so long
in practice that to users it all looks the same.
Anyway, though you say you're OK with removing the limit, it seems like you
feel more strongly about this then I do. So I think we can keep it, I don't
think it prevents us from doing something else down the road.
I do think we should add the notification to the driver, including a
listener event, which just ignores `mostRelevantSubsetOfBlacklistedNodes`,
unless anyone has a reason for not doing it. I suggest @attilapiros does that
in a followup.
If that plan sounds OK, then this is probably nearly ready to merge. But
its been a little while since I've looked closely so I'll do another pass
(probably tomorrow).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]