[
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15994014#comment-15994014
]
Carlo Curino edited comment on YARN-6451 at 5/2/17 11:31 PM:
-------------------------------------------------------------
I see two or three alternatives:
# Hard-coding the most important invariants in a programmatic way, you see an
example of this in: YARN-6473, where I poke the {{ReservationSystem}} and
{{YarnScheduler}} to check whether their data-structures remain in sync during
execution. This is more minimalistic/efficient, but any extension requires code
changes. For example, you can maintain an observer of container allocations,
and check that certain ordering properties are respected.
# Expand the mechanics of YARN-6451 by adding "bindings" for many more parts
of the RM internal state, which one is allowed to mentioned in the
{{invariants.txt}} file. Metrics was a natural starting point, as the cost of
gathering is already there, and their names are externally known. To minimize
the cost, we could load the {{invariants.txt}} expressions, and then limit the
"state" we probe to be the least one covering the needs of our expressions.
# (discussing with [~chris.douglas] another option emerged) Leverage compiler
APIs / aspects / dependency-injection type of tricks to dynamically modify the
code that does the binding work, to cover whatever appears in
{{invariants.txt}} file. This is obviously the richest one, though it has some
maintainability issues.
In YARN-6547 I propose a simple way of combining YARN-6363 and YARN-6451
capabilities to run tests that check an SLS run for common invariants (both
during and at the end of the run). That is mostly a mechanism patch, but we can
work together to define very tight yet robust invariants for specific runs.
was (Author: curino):
I see two or three alternatives:
# Hard-coding the most important invariants in a programmatic way, you see an
example of this in: YARN-6473, where I poke the {{ReservationSystem}} and
{{YarnScheduler}} to check whether their data-structures remain in sync during
execution. This is more minimalistic/efficient, but any extension requires code
changes. For example, you can maintain an observer of container allocations,
and check that certain ordering properties are respected.
# Expand the mechanics of YARN-6451 by adding "bindings" for many more parts
of the RM internal state, which one is allowed to mentioned in the
{{invariants.txt}} file. Metrics was a natural starting point, as the cost of
gathering is already there, and their names are externally known. To minimize
the cost, we could load the {{invariants.txt}} expressions, and then limit the
"state" we probe to be the least one covering the needs of our expressions.
# Leverage compiler APIs / aspects / dependency-injection type of tricks to
dynamically modify the code that does the binding work, to cover whatever
appears in {{invariants.txt}} file. This is obviously the richest one, though
it has some maintainability issues.
In YARN-6547 I propose a simple way of combining YARN-6363 and YARN-6451
capabilities to run tests that check an SLS run for common invariants (both
during and at the end of the run). That is mostly a mechanism patch, but we can
work together to define very tight yet robust invariants for specific runs.
> Add RM monitor validating metrics invariants
> --------------------------------------------
>
> Key: YARN-6451
> URL: https://issues.apache.org/jira/browse/YARN-6451
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Carlo Curino
> Assignee: Carlo Curino
> Fix For: 3.0.0-alpha3
>
> Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch,
> YARN-6451.v2.patch, YARN-6451.v3.patch, YARN-6451.v4.patch, YARN-6451.v5.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be
> useful to have a mechanism to continuously check whether core invariants of
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly
> respected, certain latencies within expected range, etc..)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]