> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines > > 200-202 > > <https://reviews.apache.org/r/54439/diff/3/?file=1578834#file1578834line200> > > > > Why do we only sample active updates, seems like we could miss data > > points? Especially for small updates. > > Joshua Cohen wrote: > My thinking was that the vast majority of updates in the store will be > completed hours or days ago, so there's no need to consider them when > calculating the mttu. You're right, this does mean that we might lose some > data points for tasks that moved to `ASSIGNED` in the same > `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update > completed. > > For reference, some general stats from one of our clusters: currently at > off-peak hours, .02% of all updates in the update store are active. It's hard > to say with certainty, historically how many updates were active at any given > time. But anecdotatlly it's a small fraction of the total number of updates > in the store, generously speaking I'd say 1-2%. That being the case, by > including only active updates in the calculation, we reduce the work to be > done by anywhere from 98 to 99.98 percent. > > I feel like this is a fair trade off to make, but I'm not steadfast in > that opinion. > > Santhosh Kumar Shanmugham wrote: > We can add a storage method that will give all the `InstanceUpdateEvent`s > during the last `SLA_REFRESH_INTERVAL` and use that to determine the > `activeUpdates` that will be looked into, this can give a much more accurate > value.
I think that would filter out updates that are currently active but have not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial example would be an update that processes batches of one instance where each instance takes more than a minute to update. - Joshua ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54439/#review158363 ----------------------------------------------------------- On Dec. 8, 2016, 9:40 p.m., Joshua Cohen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54439/ > ----------------------------------------------------------- > > (Updated Dec. 8, 2016, 9:40 p.m.) > > > Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and > Santhosh Kumar Shanmugham. > > > Repository: aurora > > > Description > ------- > > The metric is calculated from the time of the `INSTANCE_UPDATING` event to > the subsequent `ASSIGNED` event for the task with the same instance id that > matches the desired task config from the update details. > > My original approach to this involved converting `GroupType` and > `AlgorithmType` from enums (which cannot be generic) to static classes > (which, of course, can). This allowed me to avoid unnecessarily passing > update details to the `calculate` method of `SlaAlgorithm` since it's ignored > in all but the one, new case. However, that ended up being a lot of churn, > and since it turns out we need both the task details and the update details > to calculate this metric, I went with the below approach. If anyone feels > strongly, I could go back to generics and create an container class that's > gives access to both the tasks and update details. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java > 9a56cda809fbbcb07e6dd12c7a0feb272542491d > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java > 5d8d5bd8f705770979f284d26d2e932aabe707e5 > src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java > 6fbd4e962b3bb6eeb0831c810a321478fd52172c > src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java > 953b65f28a585375e36e305dea6f9f94f99abc93 > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java > 2e719ac6b7aea86faa22deff2cc6b5f73135761c > src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java > 341e346e794c9cf9a2789b8799f38fff900ec9b3 > src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java > 78f440f7546de9ed6842cb51db02b3bddc9a74ff > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > 21d26b3930ea965487b2dec48a48a98677ba022b > src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java > 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 > > src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java > 6d0e9bc6a8040393875d4f0a88e8db9d6926a88b > > Diff: https://reviews.apache.org/r/54439/diff/ > > > Testing > ------- > > ./gradlew build -Pq > e2e tests. > > > Thanks, > > Joshua Cohen > >