> On Nov. 14, 2014, 2:24 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java, line 226
> > <https://reviews.apache.org/r/27705/diff/2/?file=763034#file763034line226>
> >
> > To get the data we want, some extra analysis is needed. Specifically -
> > if we want to figure out how often a scheduling attempt is vetoed _only_
> > for static reasons (e.g. insufficient resources), these stats will lack
> > signal.
> >
> > Instead, we probably want two counters:
> > - scheduling_veto_static
> > - scheduling_veto_dynamic
> >
> > Does that make sense?
>
> Maxim Khutornenko wrote:
> I don't see how more granular data would prevent us from aggregating into
> static/dynamic groups. However, having aggregate metrics instead will make it
> impossible to do any further analysis when needed. Why not going the more
> specific route instead? I would have hard time figuring out what
> "scheduling_veto_static" means without digging through the sources, whereas
> something like "scheduling_veto_INSUFFICIENT_RESOURCES" would immediately
> make sense by itself.
>
> Bill Farner wrote:
> The problem is that you can't discern when a task didn't match due to
> _only_ static reasons. Relevant code in `SchedulingFilterImpl`:
>
> return ImmutableSet.<Veto>builder()
> .addAll(getConstraintFilter(attributeAggregate,
> attributes).apply(task))
> .addAll(getResourceVetoes(offer, task))
> .build();
>
> On the other end when you incrmeent counters:
>
> for (Veto veto : event.getVetoes()) {
> counters.getUnchecked(vetoStatName(veto)).increment();
> }
>
> At this point, you might get vetoes like: `insufficient CPU`,
> `insufficient RAM`, `insufficient ports`, `limit not satisfied: host`.
> You'll end up with these counter deltas:
>
> `INSUFFICIENT_RESOURCES 3`
> `LIMIT_NOT_SATISFIED 1`
>
> As a result, i don't see how we could look at the stats and convince
> ourselves which optimization has the greatest payoff, since a single
> scheduling round affects multiple counters disproportionately.
Isn't it the same problem with the aggregate counters? I.e. in the above
example we would still see static=1 (or 3?) and dynamic=1.
To address your concern of excessive counting, how about maintaining unique
veto type counters instead? Something like this:
```java
ListMultimap<VetoType, Veto> index = Multimaps.index(event.getVetoes(),
VETO_TO_VETO_TYPE);
for (VetoType vetoType : index.keys()) {
counters.getUnchecked(vetoStatName(vetoType)).increment();
}
```
For the above example, it would produce:
scheduling_veto_INSUFFICIENT_RESOURCES 1
scheduling_veto_LIMIT_NOT_SATISFIED 1
- Maxim
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27705/#review61385
-----------------------------------------------------------
On Nov. 14, 2014, 12:30 a.m., Maxim Khutornenko wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27705/
> -----------------------------------------------------------
>
> (Updated Nov. 14, 2014, 12:30 a.m.)
>
>
> Review request for Aurora, Bill Farner and Zameer Manji.
>
>
> Bugs: AURORA-914
> https://issues.apache.org/jira/browse/AURORA-914
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Adding @Timed to trace scheduling latencies and Veto counters per type.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/TaskVars.java
> cf8f7584afee758c527798914181049051aef0d8
> src/main/java/org/apache/aurora/scheduler/async/OfferQueue.java
> d2682cd910d248c897e691bcb4c8a3a6f1aec2d2
> src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> e2ba8b8fe978a58d1edcd01963ea020e54529353
> src/main/java/org/apache/aurora/scheduler/filter/ConstraintFilter.java
> 3839083f27ca5d4b93406152559b58b04e912a10
> src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java
> c1c5f26723f1eac3000e09e061b4582f922fded6
> src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java
> cc6b53b3265253f76c1e954c0108aa5936f5cc36
> src/main/java/org/apache/aurora/scheduler/metadata/NearestFit.java
> 87203690f09456ac1ca5e9da2b82826d60cbd723
> src/main/java/org/apache/aurora/scheduler/stats/CachedCounters.java
> aaedb3b5ec2cb27550449435efa8f335c6a9baad
> src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java
> 12ea4c67350c2992f59bacd21a99d1413b60b757
>
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
> 94f0a179b786649775899f855f7c1a0caab7290f
>
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
> e113eba1f304279b5ee3d70db1d1ea558efd63ac
> src/test/java/org/apache/aurora/scheduler/metadata/NearestFitTest.java
> b60b004adbd6753ec6fef125fd70286be5071c56
>
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
> 5c9ea6cf4eb4d99d94f5d61e784dd7c9c480798c
>
> Diff: https://reviews.apache.org/r/27705/diff/
>
>
> Testing
> -------
>
> ./gradlew -Pq build
> Verified new stats in vagrant.
>
>
> Thanks,
>
> Maxim Khutornenko
>
>