[ 
https://issues.apache.org/jira/browse/YARN-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961521#comment-15961521
 ] 

Carlo Curino commented on YARN-6451:
------------------------------------

[~jlowe] I think the inline asserts might be useful particularly for simple 
"range" type checks (like the silly examples I have so far). 
The one concern would be whether the checks done for each state transition of 
the metrics become to expensive. The other issue
is that while certain invariants are universally true, other might be 
deployment specific, and by having them externally loaded/configured
like I do in this example, it is easier to customize them for a specific 
workload. E.g., in some of our clusters apps are self-throttling
and when they ask for containers they should be receiving them very quickly, so 
we would like to establish some invariant on allocation
latency, which cannot be assert generally.

All in all, I would like to foster more invariant checking in our codebase, as 
a way to complement more specific unit tests---this
little patch is a step in that direction.
In particular, given the work done in SLS, I think we can easily have 
integration tests that run large portions of the codebase (e.g., the RM)
simulating a large workload, and check that important invariants (e.g., complex 
one like you mentioned) are respected. 


> Create a monitor to check whether we maintain RM (scheduling) invariants
> ------------------------------------------------------------------------
>
>                 Key: YARN-6451
>                 URL: https://issues.apache.org/jira/browse/YARN-6451
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Carlo Curino
>            Assignee: Carlo Curino
>         Attachments: YARN-6451.v0.patch, YARN-6451.v1.patch
>
>
> For SLS runs, as well as for live test clusters (and maybe prod), it would be 
> useful to have a mechanism to continuously check whether core invariants of 
> the RM/Scheduler are respected (e.g., no priority inversions, fairness mostly 
> respected, certain latencies within expected range, etc..)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to