[
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971807#comment-15971807
]
Carlo Curino commented on YARN-6451:
------------------------------------
Good call. I implemented the *fast* check all invariant at once, and re-check
them individually to give better logs. Seems to work well (<0.1ms runtime for
the normal case and pretty logs of the form:
{code}
Invariant "AvailableVCores >= 0" is NOT holding, with bindings:
{AvailableVCores=-1}
{code}
# Agreed on the general next steps for generalizing this. Per our offline
discussion, it is probably worth to refactor and extend this later, if/when we
start using it more heavily.
# I have increased (slightly) the test coverage, though the bulk of test usage
will come in follow-up patches where we combine SLS and this mechanics to have
basically integration tests for the overall RM.
# I added the warning printout during init.
> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Carlo Curino
> Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch,
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be
> useful to have a mechanism to continuously check whether core invariants of
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly
> respected, certain latencies within expected range, etc..)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]