> On Dec. 7, 2016, 10:28 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200>
> >
> >     Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
>     My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
>     
>     For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
>     
>     I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
>     We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.
> 
> Joshua Cohen wrote:
>     I think that would filter out updates that are currently active but have 
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial 
> example would be an update that processes batches of one instance where each 
> instance takes more than a minute to update.

I am talking about this part of the code.

             .filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
                          && timeFrame.contains(taskEvent.getTimestamp()))
                          
I think I misspoke about the event type, it is a `TaskEvent`.


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
-----------------------------------------------------------


On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 1:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>

Reply via email to