[
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961022#comment-15961022
]
Jason Lowe commented on YARN-6451:
----------------------------------
Interesting idea. For some of these invariants, would it make more sense to
put an assert-like hook in the metric code itself? I'm thinking why hope that
a periodic interval happens to catch the metric being negative when we can have
the metric itself protest when someone tries to set it below zero? As a bonus,
we'd have access to the stacktrace that triggered it.
I could see this periodic approach being really useful for more complicated
expressions like validating stats across users, across queues, etc. where it's
tricky/expensive to evaluate it on a single metric update.
> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Carlo Curino
> Assignee: Carlo Curino
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be
> useful to have a mechanism to continuously check whether core invariants of
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly
> respected, certain latencies within expected range, etc..)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]