[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961022#comment-15961022 ]
Jason Lowe commented on YARN-6451: ---------------------------------- Interesting idea. For some of these invariants, would it make more sense to put an assert-like hook in the metric code itself? I'm thinking why hope that a periodic interval happens to catch the metric being negative when we can have the metric itself protest when someone tries to set it below zero? As a bonus, we'd have access to the stacktrace that triggered it. I could see this periodic approach being really useful for more complicated expressions like validating stats across users, across queues, etc. where it's tricky/expensive to evaluate it on a single metric update. > Create a monitor to check whether we maintain RM (scheduling) invariants > ------------------------------------------------------------------------ > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Carlo Curino > Assignee: Carlo Curino > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org