[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994042#comment-15994042 ] Wangda Tan commented on YARN-6451: -- Thanks [~curino] for your responses. I personally think #3 is the good way to go, I agree the approach to get low-hanging fruit first via existing metrics-based mechanisms. > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994014#comment-15994014 ] Carlo Curino commented on YARN-6451: I see two or three alternatives: # Hard-coding the most important invariants in a programmatic way, you see an example of this in: YARN-6473, where I poke the {{ReservationSystem}} and {{YarnScheduler}} to check whether their data-structures remain in sync during execution. This is more minimalistic/efficient, but any extension requires code changes. For example, you can maintain an observer of container allocations, and check that certain ordering properties are respected. # Expand the mechanics of YARN-6451 by adding "bindings" for many more parts of the RM internal state, which one is allowed to mentioned in the {{invariants.txt}} file. Metrics was a natural starting point, as the cost of gathering is already there, and their names are externally known. To minimize the cost, we could load the {{invariants.txt}} expressions, and then limit the "state" we probe to be the least one covering the needs of our expressions. # Leverage compiler APIs / aspects / dependency-injection type of tricks to dynamically modify the code that does the binding work, to cover whatever appears in {{invariants.txt}} file. This is obviously the richest one, though it has some maintainability issues. In YARN-6547 I propose a simple way of combining YARN-6363 and YARN-6451 capabilities to run tests that check an SLS run for common invariants (both during and at the end of the run). That is mostly a mechanism patch, but we can work together to define very tight yet robust invariants for specific runs. > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993981#comment-15993981 ] Wangda Tan commented on YARN-6451: -- Thanks [~curino]/[~chris.douglas], Beyond metrics, i think there're many information are not inside metrics, such as order of container allocation to ensure FIFO/fairness, etc. Have you thought about how to formalize these requirements? > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973243#comment-15973243 ] Carlo Curino commented on YARN-6451: Thanks [~chris.douglas], I might cherry-pick it back to branch-2 later on. > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants
[ https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973230#comment-15973230 ] Hudson commented on YARN-6451: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11601 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11601/]) YARN-6451. Add RM monitor validating metrics invariants. Contributed by (cdouglas: rev af8e9842d2ca566528e09d905b609f1cf160d367) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/TestMetricsInvariantChecker.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/MetricsInvariantChecker.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/InvariantsChecker.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/package-info.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/invariants.txt * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/InvariantViolationException.java > Add RM monitor validating metrics invariants > > > Key: YARN-6451 > URL: https://issues.apache.org/jira/browse/YARN-6451 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 3.0.0-alpha3 > > Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, > YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch > > > For SLS runs, as well as for live test clusters (and maybe prod), it would be > useful to have a mechanism to continuously check whether core invariants of > the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly > respected, certain latencies within expected range, etc..) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org