[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961022#comment-15961022
 ] 

Jason Lowe commented on YARN-6451:
----------------------------------

Interesting idea.  For some of these invariants, would it make more sense to 
put an assert-like hook in the metric code itself?  I'm thinking why hope that 
a periodic interval happens to catch the metric being negative when we can have 
the metric itself protest when someone tries to set it below zero?  As a bonus, 
we'd have access to the stacktrace that triggered it.

I could see this periodic approach being really useful for more complicated 
expressions like validating stats across users, across queues, etc. where it's 
tricky/expensive to evaluate it on a single metric update.

> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>
>                 Key: YARN-6451
>                 URL: https://issues.apache.org/jira/browse/YARN-6451
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to