> On Dec. 7, 2016, 10:28 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> > Why do we only sample active updates, seems like we could miss data
> > points? Especially for small updates.
>
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be
> completed hours or days ago, so there's no need to consider them when
> calculating the mttu. You're right, this does mean that we might lose some
> data points for tasks that moved to `ASSIGNED` in the same
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update
> completed.
>
> For reference, some general stats from one of our clusters: currently at
> off-peak hours, .02% of all updates in the update store are active. It's hard
> to say with certainty, historically how many updates were active at any given
> time. But anecdotatlly it's a small fraction of the total number of updates
> in the store, generously speaking I'd say 1-2%. That being the case, by
> including only active updates in the calculation, we reduce the work to be
> done by anywhere from 98 to 99.98 percent.
>
> I feel like this is a fair trade off to make, but I'm not steadfast in
> that opinion.
>
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the
> `activeUpdates` that will be looked into, this can give a much more accurate
> value.
>
> Joshua Cohen wrote:
> I think that would filter out updates that are currently active but have
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial
> example would be an update that processes batches of one instance where each
> instance takes more than a minute to update.
I am talking about this part of the code.
.filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
&& timeFrame.contains(taskEvent.getTimestamp()))
I think I misspoke about the event type, it is a `TaskEvent`.
- Santhosh Kumar
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
-----------------------------------------------------------
On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> -----------------------------------------------------------
>
> (Updated Dec. 8, 2016, 1:40 p.m.)
>
>
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and
> Santhosh Kumar Shanmugham.
>
>
> Repository: aurora
>
>
> Description
> -------
>
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to
> the subsequent `ASSIGNED` event for the task with the same instance id that
> matches the desired task config from the update details.
>
> My original approach to this involved converting `GroupType` and
> `AlgorithmType` from enums (which cannot be generic) to static classes
> (which, of course, can). This allowed me to avoid unnecessarily passing
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored
> in all but the one, new case. However, that ended up being a lot of churn,
> and since it turns out we need both the task details and the update details
> to calculate this metric, I went with the below approach. If anyone feels
> strongly, I could go back to generics and create an container class that's
> gives access to both the tasks and update details.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d
> src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
> 5d8d5bd8f705770979f284d26d2e932aabe707e5
> src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c
> src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java
> 953b65f28a585375e36e305dea6f9f94f99abc93
> src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c
> src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java
> 341e346e794c9cf9a2789b8799f38fff900ec9b3
> src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff
>
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
> 21d26b3930ea965487b2dec48a48a98677ba022b
> src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73
>
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
> 6d0e9bc6a8040393875d4f0a88e8db9d6926a88b
>
> Diff: https://reviews.apache.org/r/54439/diff/
>
>
> Testing
> -------
>
> ./gradlew build -Pq
> e2e tests.
>
>
> Thanks,
>
> Joshua Cohen
>
>