[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants

2017-05-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994042#comment-15994042
 ] 

Wangda Tan commented on YARN-6451:
--

Thanks [~curino] for your responses.

I personally think #3 is the good way to go, I agree the approach to get 
low-hanging fruit first via existing metrics-based mechanisms. 

> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants

2017-05-02 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994014#comment-15994014
 ] 

Carlo Curino commented on YARN-6451:


I see two or three alternatives:
 # Hard-coding the most important invariants in a programmatic way, you see an 
example of this in: YARN-6473, where I poke the {{ReservationSystem}} and 
{{YarnScheduler}} to check whether their data-structures remain in sync during 
execution. This is more minimalistic/efficient, but any extension requires code 
changes. For example, you can maintain an observer of container allocations, 
and check that certain ordering properties are respected.
 # Expand the mechanics of YARN-6451 by adding "bindings" for many more parts 
of the RM internal state, which one is allowed to mentioned in the 
{{invariants.txt}} file. Metrics was a natural starting point, as the cost of 
gathering is already there, and their names are externally known. To minimize 
the cost, we could load the {{invariants.txt}} expressions, and then limit the 
"state" we probe to be the least one covering the needs of our expressions.
 # Leverage compiler APIs / aspects / dependency-injection type of tricks to 
dynamically modify the code that does the binding work, to cover whatever 
appears in {{invariants.txt}} file. This is obviously the richest one, though 
it has some maintainability issues. 

In YARN-6547 I propose a simple way of combining YARN-6363 and YARN-6451 
capabilities to run tests that check an SLS run for common invariants (both 
during and at the end of the run). That is mostly a mechanism patch, but we can 
work together to define very tight yet robust invariants for specific runs.


> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants

2017-05-02 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993981#comment-15993981
 ] 

Wangda Tan commented on YARN-6451:
--

Thanks [~curino]/[~chris.douglas],

Beyond metrics, i think there're many information are not inside metrics, such 
as order of container allocation to ensure FIFO/fairness, etc. Have you thought 
about how to formalize these requirements?

> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants

2017-04-18 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973243#comment-15973243
 ] 

Carlo Curino commented on YARN-6451:


Thanks [~chris.douglas], I might cherry-pick it back to branch-2 later on.

> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6451) Add RM monitor validating metrics invariants

2017-04-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973230#comment-15973230
 ] 

Hudson commented on YARN-6451:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11601 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11601/])
YARN-6451. Add RM monitor validating metrics invariants. Contributed by 
(cdouglas: rev af8e9842d2ca566528e09d905b609f1cf160d367)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/TestMetricsInvariantChecker.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/MetricsInvariantChecker.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/InvariantsChecker.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/package-info.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/invariants.txt
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/invariants/InvariantViolationException.java


> Add RM monitor validating metrics invariants
> 
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch, 
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org