> On Nov. 14, 2014, 2:24 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java, line 226
> > <https://reviews.apache.org/r/27705/diff/2/?file=763034#file763034line226>
> >
> >     To get the data we want, some extra analysis is needed.  Specifically - 
> > if we want to figure out how often a scheduling attempt is vetoed _only_ 
> > for static reasons (e.g. insufficient resources), these stats will lack 
> > signal.
> >     
> >     Instead, we probably want two counters:
> >     - scheduling_veto_static
> >     - scheduling_veto_dynamic
> >     
> >     Does that make sense?
> 
> Maxim Khutornenko wrote:
>     I don't see how more granular data would prevent us from aggregating into 
> static/dynamic groups. However, having aggregate metrics instead will make it 
> impossible to do any further analysis when needed. Why not going the more 
> specific route instead? I would have hard time figuring out what 
> "scheduling_veto_static" means without digging through the sources, whereas 
> something like "scheduling_veto_INSUFFICIENT_RESOURCES" would immediately 
> make sense by itself.
> 
> Bill Farner wrote:
>     The problem is that you can't discern when a task didn't match due to 
> _only_ static reasons.  Relevant code in `SchedulingFilterImpl`:
>     
>         return ImmutableSet.<Veto>builder()
>             .addAll(getConstraintFilter(attributeAggregate, 
> attributes).apply(task))
>             .addAll(getResourceVetoes(offer, task))
>             .build();
>             
>     On the other end when you incrmeent counters:
>     
>         for (Veto veto : event.getVetoes()) {
>           counters.getUnchecked(vetoStatName(veto)).increment();
>         }
>     
>     At this point, you might get vetoes like: `insufficient CPU`, 
> `insufficient RAM`, `insufficient ports`, `limit not satisfied: host`.
>     You'll end up with these counter deltas:
>     
>     `INSUFFICIENT_RESOURCES 3`
>     `LIMIT_NOT_SATISFIED 1`
>     
>     As a result, i don't see how we could look at the stats and convince 
> ourselves which optimization has the greatest payoff, since a single 
> scheduling round affects multiple counters disproportionately.
> 
> Maxim Khutornenko wrote:
>     Isn't it the same problem with the aggregate counters? I.e. in the above 
> example we would still see static=1 (or 3?) and dynamic=1.
>     
>     To address your concern of excessive counting, how about maintaining 
> unique veto type counters instead? Something like this:
>     ```java
>         ListMultimap<VetoType, Veto> index = 
> Multimaps.index(event.getVetoes(), VETO_TO_VETO_TYPE);
>         for (VetoType vetoType : index.keys()) {
>           counters.getUnchecked(vetoStatName(vetoType)).increment();
>         }
>     ```
>     For the above example, it would produce:
>     
>     scheduling_veto_INSUFFICIENT_RESOURCES 1
>     scheduling_veto_LIMIT_NOT_SATISFIED  1

Discussed with Bill offline. There is more logic to it. It's not just about 
gouping metrics but rather reporting the group when ALL of the Vetos issued 
fall into the same group. For example: 
- insufficient RAM, limit not satisfied - only "static" vetos -> increment 
"static" counter;
- constraint mismatch, insufficient RAM - mixed "static" and "dynamic" vetos -> 
increment "mixed" counter;
- constraint mismatch - only "dynamic" vetos -> increment "dynamic" counter;


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27705/#review61385
-----------------------------------------------------------


On Nov. 14, 2014, 12:30 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27705/
> -----------------------------------------------------------
> 
> (Updated Nov. 14, 2014, 12:30 a.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-914
>     https://issues.apache.org/jira/browse/AURORA-914
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Adding @Timed to trace scheduling latencies and Veto counters per type.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/TaskVars.java 
> cf8f7584afee758c527798914181049051aef0d8 
>   src/main/java/org/apache/aurora/scheduler/async/OfferQueue.java 
> d2682cd910d248c897e691bcb4c8a3a6f1aec2d2 
>   src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java 
> e2ba8b8fe978a58d1edcd01963ea020e54529353 
>   src/main/java/org/apache/aurora/scheduler/filter/ConstraintFilter.java 
> 3839083f27ca5d4b93406152559b58b04e912a10 
>   src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java 
> c1c5f26723f1eac3000e09e061b4582f922fded6 
>   src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java 
> cc6b53b3265253f76c1e954c0108aa5936f5cc36 
>   src/main/java/org/apache/aurora/scheduler/metadata/NearestFit.java 
> 87203690f09456ac1ca5e9da2b82826d60cbd723 
>   src/main/java/org/apache/aurora/scheduler/stats/CachedCounters.java 
> aaedb3b5ec2cb27550449435efa8f335c6a9baad 
>   src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 
> 12ea4c67350c2992f59bacd21a99d1413b60b757 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  94f0a179b786649775899f855f7c1a0caab7290f 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  e113eba1f304279b5ee3d70db1d1ea558efd63ac 
>   src/test/java/org/apache/aurora/scheduler/metadata/NearestFitTest.java 
> b60b004adbd6753ec6fef125fd70286be5071c56 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  5c9ea6cf4eb4d99d94f5d61e784dd7c9c480798c 
> 
> Diff: https://reviews.apache.org/r/27705/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> Verified new stats in vagrant.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to