Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Santhosh Kumar Shanmugham


> On Dec. 7, 2016, 10:28 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > 
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.
> 
> Joshua Cohen wrote:
> I think that would filter out updates that are currently active but have 
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial 
> example would be an update that processes batches of one instance where each 
> instance takes more than a minute to update.
> 
> Santhosh Kumar Shanmugham wrote:
> I am talking about this part of the code.
> 
>  .filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
>   && timeFrame.contains(taskEvent.getTimestamp()))
>   
> I think I misspoke about the event type, it is a `TaskEvent`.
> 
> Joshua Cohen wrote:
> I'm not sure I follow? The discussion above was related to whether we 
> should be filtering completed updates, or whether we should iterate all 
> updates in the store. Querying for only task events active in the last 
> `SLA_REFRESH_INTERVAL` wouldn't be helpful for a few reasons:
> 
> 1) We already have the full list of tasks in `MetricCalculator` for use 
> in other SLA calculations.
> 2) By the time we iterate task events to find the `ASSIGNED` event, we've 
> already iterated the update details.
> 2) The number of task events at this point will be small... maybe 5 or 6 
> tops.
> 
> Is there something I'm missing?

I feel that the current logic has correctness issues, since it can miss 
arbitrary number of data points and makes the metric unreliable. My suggestion 
is to lead to the creation a metric that is reliable at the expense of maybe 
more work.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 1:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 

Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > 
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.
> 
> Joshua Cohen wrote:
> I think that would filter out updates that are currently active but have 
> not have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial 
> example would be an update that processes batches of one instance where each 
> instance takes more than a minute to update.
> 
> Santhosh Kumar Shanmugham wrote:
> I am talking about this part of the code.
> 
>  .filter(taskEvent -> taskEvent.getStatus() == ASSIGNED
>   && timeFrame.contains(taskEvent.getTimestamp()))
>   
> I think I misspoke about the event type, it is a `TaskEvent`.

I'm not sure I follow? The discussion above was related to whether we should be 
filtering completed updates, or whether we should iterate all updates in the 
store. Querying for only task events active in the last `SLA_REFRESH_INTERVAL` 
wouldn't be helpful for a few reasons:

1) We already have the full list of tasks in `MetricCalculator` for use in 
other SLA calculations.
2) By the time we iterate task events to find the `ASSIGNED` event, we've 
already iterated the update details.
2) The number of task events at this point will be small... maybe 5 or 6 tops.

Is there something I'm missing?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 9:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 9:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 

Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > 
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.
> 
> Santhosh Kumar Shanmugham wrote:
> We can add a storage method that will give all the `InstanceUpdateEvent`s 
> during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
> `activeUpdates` that will be looked into, this can give a much more accurate 
> value.

I think that would filter out updates that are currently active but have not 
have an instance event in the past `SLA_REFRESH_INTERVAL`. A trivial example 
would be an update that processes batches of one instance where each instance 
takes more than a minute to update.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 9:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 9:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Santhosh Kumar Shanmugham


> On Dec. 7, 2016, 10:28 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > 
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.
> 
> Joshua Cohen wrote:
> My thinking was that the vast majority of updates in the store will be 
> completed hours or days ago, so there's no need to consider them when 
> calculating the mttu. You're right, this does mean that we might lose some 
> data points for tasks that moved to `ASSIGNED` in the same 
> `SLA_REFRESH_INTERVAL` (defaults to one minute) in which the entire update 
> completed.
> 
> For reference, some general stats from one of our clusters: currently at 
> off-peak hours, .02% of all updates in the update store are active. It's hard 
> to say with certainty, historically how many updates were active at any given 
> time. But anecdotatlly it's a small fraction of the total number of updates 
> in the store, generously speaking I'd say 1-2%. That being the case, by 
> including only active updates in the calculation, we reduce the work to be 
> done by anywhere from 98 to 99.98 percent.
> 
> I feel like this is a fair trade off to make, but I'm not steadfast in 
> that opinion.

We can add a storage method that will give all the `InstanceUpdateEvent`s 
during the last `SLA_REFRESH_INTERVAL` and use that to determine the 
`activeUpdates` that will be looked into, this can give a much more accurate 
value.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 1:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-09 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158651
---




src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java (line 153)


Can `event.getTimestamp() < pendingTs` ?



src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java (line 190)


We can make `JobUpdateAction.INSTANCE_UPDATING` a parameter to the method 
and can use this same logic to track MTTRB (median time to rollback).



src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java (line 199)


`taskEvent.getTimestamp() > start` ?

Elongating the `timeFrame` might bring in cases where, task has following 
event lifecycle,

ASSIGNED (t1) -> INSTANCE_UPDATING (t2) -> PENDING (t3) -> ASSIGNED (t4).

MTTU => t1 - t2 < 0


- Santhosh Kumar Shanmugham


On Dec. 8, 2016, 1:40 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 8, 2016, 1:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Mehrdad Nurolahzade, and 
> Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158575
---



Please remove me from the reviewers list. I have no time to review this.

I support this metric and look forward to using it once it is landed.

+1 to Serb's comment about adding a chnagelog entry

bonus points for updating the metrics docs.

- Zameer Manji


On Dec. 7, 2016, 9:50 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 9:50 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158573
---



LGTM. Please add:

* a changelog entry
* a short section to our metrics docs 
https://github.com/apache/aurora/blob/master/docs/features/sla-metrics.md

- Stephan Erb


On Dec. 7, 2016, 6:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 6:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Mehrdad Nurolahzade


> On Dec. 7, 2016, 8:42 a.m., Mehrdad Nurolahzade wrote:
> > A general side note: SLA metrics calculation is currently the most 
> > expensive cpu-bound operation handled by the scheduler (it can take as much 
> > as 50% master cpu cycles). The calculators seem like a good fit for 
> > micro-benchmarking with JMH.
> 
> Joshua Cohen wrote:
> Do you think that's a blocker to shipping this metric, or just a general 
> nice-to-have?
> 
> Mehrdad Nurolahzade wrote:
> Not a blocker, definetly a nice to have.

I created a ticket for bechmarking of SLA metrics calculations: 
https://issues.apache.org/jira/browse/AURORA-1854


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158343
---


On Dec. 7, 2016, 9:50 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 9:50 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-08 Thread Joshua Cohen


> On Dec. 7, 2016, 6:28 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java, lines 
> > 200-202
> > 
> >
> > Why do we only sample active updates, seems like we could miss data 
> > points? Especially for small updates.

My thinking was that the vast majority of updates in the store will be 
completed hours or days ago, so there's no need to consider them when 
calculating the mttu. You're right, this does mean that we might lose some data 
points for tasks that moved to `ASSIGNED` in the same `SLA_REFRESH_INTERVAL` 
(defaults to one minute) in which the entire update completed.

For reference, some general stats from one of our clusters: currently at 
off-peak hours, .02% of all updates in the update store are active. It's hard 
to say with certainty, historically how many updates were active at any given 
time. But anecdotatlly it's a small fraction of the total number of updates in 
the store, generously speaking I'd say 1-2%. That being the case, by including 
only active updates in the calculation, we reduce the work to be done by 
anywhere from 98 to 99.98 percent.

I feel like this is a fair trade off to make, but I'm not steadfast in that 
opinion.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---


On Dec. 7, 2016, 5:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 5:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158363
---




src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java (lines 200 
- 202)


Why do we only sample active updates, seems like we could miss data points? 
Especially for small updates.


- David McLaughlin


On Dec. 7, 2016, 5:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 5:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158360
---




src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java (line 261)


I find the calculation logic in some of these test cases difficult to 
follow. 

Could you maybe add comments explaining in plain math what is the 
justification behind the hard-coded expected values?


- Mehrdad Nurolahzade


On Dec. 7, 2016, 9:50 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 9:50 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158357
---


Ship it!




Master (91ddb07) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Dec. 7, 2016, 5:50 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 5:50 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/
---

(Updated Dec. 7, 2016, 5:50 p.m.)


Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, and 
Zameer Manji.


Changes
---

Add task index to the mttu calculation to avoid iterating tasks for every job 
update.


Repository: aurora


Description
---

The metric is calculated from the time of the `INSTANCE_UPDATING` event to the 
subsequent `ASSIGNED` event for the task with the same instance id that matches 
the desired task config from the update details.

My original approach to this involved converting `GroupType` and 
`AlgorithmType` from enums (which cannot be generic) to static classes (which, 
of course, can). This allowed me to avoid unnecessarily passing update details 
to the `calculate` method of `SlaAlgorithm` since it's ignored in all but the 
one, new case. However, that ended up being a lot of churn, and since it turns 
out we need both the task details and the update details to calculate this 
metric, I went with the below approach. If anyone feels strongly, I could go 
back to generics and create an container class that's gives access to both the 
tasks and update details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
9a56cda809fbbcb07e6dd12c7a0feb272542491d 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
5d8d5bd8f705770979f284d26d2e932aabe707e5 
  src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
6fbd4e962b3bb6eeb0831c810a321478fd52172c 
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
953b65f28a585375e36e305dea6f9f94f99abc93 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
2e719ac6b7aea86faa22deff2cc6b5f73135761c 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
341e346e794c9cf9a2789b8799f38fff900ec9b3 
  src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
78f440f7546de9ed6842cb51db02b3bddc9a74ff 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
21d26b3930ea965487b2dec48a48a98677ba022b 
  src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 

Diff: https://reviews.apache.org/r/54439/diff/


Testing
---

./gradlew build -Pq
e2e tests.


Thanks,

Joshua Cohen



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Mehrdad Nurolahzade


> On Dec. 7, 2016, 8:42 a.m., Mehrdad Nurolahzade wrote:
> > A general side note: SLA metrics calculation is currently the most 
> > expensive cpu-bound operation handled by the scheduler (it can take as much 
> > as 50% master cpu cycles). The calculators seem like a good fit for 
> > micro-benchmarking with JMH.
> 
> Joshua Cohen wrote:
> Do you think that's a blocker to shipping this metric, or just a general 
> nice-to-have?

Not a blocker, definetly a nice to have.


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158343
---


On Dec. 7, 2016, 6:36 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 6:36 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Joshua Cohen


> On Dec. 7, 2016, 4:42 p.m., Mehrdad Nurolahzade wrote:
> > A general side note: SLA metrics calculation is currently the most 
> > expensive cpu-bound operation handled by the scheduler (it can take as much 
> > as 50% master cpu cycles). The calculators seem like a good fit for 
> > micro-benchmarking with JMH.

Do you think that's a blocker to shipping this metric, or just a general 
nice-to-have?


> On Dec. 7, 2016, 4:42 p.m., Mehrdad Nurolahzade wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java, lines 
> > 189-193
> > 
> >
> > This lookup seems like something that can be improved by an index? 
> > I.e., a mapping from `instanceId -> IScheduledTask`.

Yeah, I agree. I debated adding that when I was writing it, thanks for the 
nudge.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158343
---


On Dec. 7, 2016, 2:36 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 2:36 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-07 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158343
---



A general side note: SLA metrics calculation is currently the most expensive 
cpu-bound operation handled by the scheduler (it can take as much as 50% master 
cpu cycles). The calculators seem like a good fit for micro-benchmarking with 
JMH.


src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java (lines 188 - 
192)


This lookup seems like something that can be improved by an index? I.e., a 
mapping from `instanceId -> IScheduledTask`.


- Mehrdad Nurolahzade


On Dec. 7, 2016, 6:36 a.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 7, 2016, 6:36 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Santhosh Kumar Shanmugham, 
> and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-06 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158216
---


Ship it!




Master (91ddb07) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Dec. 6, 2016, 8:58 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 6, 2016, 8:58 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
> Thanks,
> 
> Joshua Cohen
> 
>



Re: Review Request 54439: Add support for an mttu metric (median time to updated)

2016-12-06 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54439/#review158214
---



Master (91ddb07) is red with this patch.
  ./build-support/jenkins/build.sh

:commons:processResources
:commons:classes
:commons:jar
:compileJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor

:generateBuildProperties
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java:21:8:
 Unused import - java.util.stream.Stream. [UnusedImports]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java:42:8:
 Unused import - org.apache.aurora.scheduler.storage.entities.IAssignedTask. 
[UnusedImports]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleMain'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 13.85 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Dec. 6, 2016, 8:49 p.m., Joshua Cohen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54439/
> ---
> 
> (Updated Dec. 6, 2016, 8:49 p.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The metric is calculated from the time of the `INSTANCE_UPDATING` event to 
> the subsequent `ASSIGNED` event for the task with the same instance id that 
> matches the desired task config from the update details.
> 
> My original approach to this involved converting `GroupType` and 
> `AlgorithmType` from enums (which cannot be generic) to static classes 
> (which, of course, can). This allowed me to avoid unnecessarily passing 
> update details to the `calculate` method of `SlaAlgorithm` since it's ignored 
> in all but the one, new case. However, that ended up being a lot of churn, 
> and since it turns out we need both the task details and the update details 
> to calculate this metric, I went with the below approach. If anyone feels 
> strongly, I could go back to generics and create an container class that's 
> gives access to both the tasks and update details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
> 9a56cda809fbbcb07e6dd12c7a0feb272542491d 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaGroup.java 
> 6fbd4e962b3bb6eeb0831c810a321478fd52172c 
>   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
> 953b65f28a585375e36e305dea6f9f94f99abc93 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> 341e346e794c9cf9a2789b8799f38fff900ec9b3 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
> 78f440f7546de9ed6842cb51db02b3bddc9a74ff 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  21d26b3930ea965487b2dec48a48a98677ba022b 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> 43e32eede27bbf26363a3fd1ca34ffe6f8c01a73 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  6d0e9bc6a8040393875d4f0a88e8db9d6926a88b 
> 
> Diff: https://reviews.apache.org/r/54439/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> e2e tests.
> 
> 
>