Re: Review Request 66192: [WIP] Variable group size updates

2018-05-08 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review202718
---



Master (805a53f) is red with this patch.
  ./build-support/jenkins/build.sh

[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java:804:13:
 'if' is not followed by whitespace. [WhitespaceAround]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java:816:7:
 'if' is not followed by whitespace. [WhitespaceAround]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:122:
 'case' child have incorrect indentation level 10, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:123:
 'block' child have incorrect indentation level 12, expected level should be 
10. [Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:125:
 'block' child have incorrect indentation level 12, expected level should be 
10. [Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:127:
 'case' child have incorrect indentation level 10, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:128:
 'block' child have incorrect indentation level 12, expected level should be 
10. [Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:129:
 'block' child have incorrect indentation level 12, expected level should be 
10. [Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:131:
 'case' child have incorrect indentation level 10, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:132:
 'case' child have incorrect indentation level 10, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java:135:
 'block' child have incorrect indentation level 12, expected level should be 
10. [Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java:20:8:
 Unused import - com.google.common.collect.FluentIterable. [UnusedImports]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java:22:
 Wrong order for 'java.util.stream.Collectors' import. Order should be: java, 
javax, scala, com, net, org. Each group should be separated by a single 
blank line. [ImportOrder]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:19:
 Wrong order for 'java.util.List' import. Order should be: java, javax, scala, 
com, net, org. Each group should be separated by a single blank line. 
[ImportOrder]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:25:
 Wrong order for 'java.util.stream.Collectors' import. Order should be: java, 
javax, scala, com, net, org. Each group should be separated by a single 
blank line. [ImportOrder]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:48:
 Line is longer than 100 characters (found 101). [LineLength]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:51:
 Line is longer than 100 characters (found 109). [LineLength]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:65:11:
 Redundant 'final' modifier. [RedundantModifier]
[ant:checkstyle] [ERROR] 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread David McLaughlin


> On May 8, 2018, 4:30 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 110 (patched)
> > 
> >
> > I'm wondering if defaultSlaPolicy needs to be supplied by the client, 
> > or configured on the server? Is there a situation where client would ever 
> > supply different values depending on the situation?
> 
> Santhosh Kumar Shanmugham wrote:
> As of today dedicated clusters can supply their own SLA requirements. 
> Setting a global SLA requirement (which can be less stricter) can be a 
> surprise for operators. 
> 
> But my initial approach was to set a server default as well.

That sounds reasonable.


> On May 8, 2018, 4:30 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 407 (patched)
> > 
> >
> > I'm struggling to understand how maintenace request presence is related 
> > to the SLA. 
> > 
> > Isn't there a race condition where any hosts in DRAINING when the 
> > Scheduler first starts up (after this patch) would have no maintenance 
> > requests? Also, doesn't the maintenance request stay present *until* the 
> > SLA is satisfied?
> > 
> > This may be another argument for having a default policy in the server.
> 
> Santhosh Kumar Shanmugham wrote:
> After this patch has been rolled out, this should never happen. 
> 
> Added this to log (we can change the message) and export metric to 
> account for exactly the situation you mentioned while rolling out the patch 
> (for debugging). It is safe to ingore drains like this after the scheduler 
> startup, since all drains are driven from clients today. So the clients will 
> retry the drain -> which will create the maintenance request object -> which 
> will unblock the drains. 
> 
> Considering the `dedicated` owners user-case (mentioned above), I dropped 
> the scheduler level config. We can bring it back if you feel strongly about 
> it.

That makes sense. Please call this out in the RELEASE-NOTES (semi-related: we 
should make sure to update all other documentaton that would be affected by 
this change).


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202652
---


On May 8, 2018, 3:49 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 3:49 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> 

Re: Review Request 66192: [WIP] Variable group size updates

2018-05-08 Thread Renan DelValle

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/
---

(Updated May 8, 2018, 4:26 p.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
Shanmugham, and Stephan Erb.


Changes
---

* Variable update group strategy now determines what step it is in dynamically 
-- no more writing to storage.
* Surface are greatly reduced and potential path towards merging Batch and 
Variable Batch were created.
* Currently works for the following cases:
- Update results in more instances.
- Update only updates existing instances.
* Does not work in its current form when the update results in less instances 
(will address this in the next iteration).
* Introduced an optional JobUpdateStrategyType which allows for explicitly 
setting which strategy to use in order to avoid confusion. If this field is not 
set, the type is determined to be QUEUE or BATCH depending on 
waitFortBatchCompletion for backwards compatibility. The plan is to deprecate 
waitForBatchCompletion and updateGroupSize depending on feedback.
* Backwards compatible by using optional fields and using old fields to 
populate the new ones.

Thanks for all the feedback so far, very much appreciated!


Repository: aurora


Description
---

Adding support for variable group sizes when executing an update.

Design doc for this change is here: 
https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz

I opted for the path of least resistance with regards to the Thrift changes as 
I didn't see any benefit in making the larger changes required to make the 
interfaces a bit more flexible.

Requesting feedback on these changes and the approach from the community before 
I proceed.

Tests will be added after the community approves of the direciton and approach.

Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to move 
towards getting rid of FluentIterable. I figured since I was touching that 
code, it wouldn't hurt to test the Java 8 equivalent of it. I can get rid of 
the change here and make it in a separate patch if desired.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
  src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
 855ea9c20788b51695b7eff5ac0970f0d52a9546 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66192/diff/2/

Changes: https://reviews.apache.org/r/66192/diff/1-2/


Testing
---


Thanks,

Renan DelValle



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java
> > Lines 35 (patched)
> > 
> >
> > Missing javadoc.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 64 (patched)
> > 
> >
> > Missing javadoc.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 149 (patched)
> > 
> >
> > Delete blank line.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 194-195 (patched)
> > 
> >
> > Would be good to add comments (either here or in the javadoc) 
> > explaining the use of locks per coordinator-URL.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 206 (patched)
> > 
> >
> > To avoid having to dive into logs to see who is blocking host 
> > maintenance, should we export metrics for the job keys that are causing the 
> > errors too? Here and in the catch block below.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java
> > Lines 141 (patched)
> > 
> >
> > You probably want multiple threads with a 1 minute (default) 
> > coordinator timeout and explicit locking, right?

Yes, that makes sense. I will reuse the number of parallel coordinator 
transaction as the thread pool limit.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 54-59 (original), 78-83 (patched)
> > 
> >
> > This (or the Impl) needs updated with a high-level overview of the new 
> > logic (the async nature of maintenance, etc.)

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 110 (patched)
> > 
> >
> > I'm wondering if defaultSlaPolicy needs to be supplied by the client, 
> > or configured on the server? Is there a situation where client would ever 
> > supply different values depending on the situation?

As of today dedicated clusters can supply their own SLA requirements. Setting a 
global SLA requirement (which can be less stricter) can be a surprise for 
operators. 

But my initial approach was to set a server default as well.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Line 131 (original), 192 (patched)
> > 
> >
> > Revert to single-line.

Done.


> On May 8, 2018, 9:30 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 407 (patched)
> > 
> >
> > I'm struggling to understand how maintenace request presence is related 
> > to the SLA. 
> > 
> > Isn't there a race condition where any hosts in DRAINING when the 
> > Scheduler first starts up (after this patch) would have no maintenance 
> > requests? Also, doesn't the maintenance request stay present *until* the 
> > SLA is satisfied?
> > 
> > This may be another argument for having a default policy in the server.

After this patch has been rolled out, this should never happen. 

Added this to log (we can change the message) and export metric to account for 
exactly the situation you mentioned while rolling out the patch (for 
debugging). It is safe to ingore drains like this after the scheduler startup, 
since all drains are driven from clients today. So the clients will retry the 
drain -> which will create the maintenance request object -> which will unblock 
the drains. 

Considering the `dedicated` owners user-case (mentioned above), I dropped the 
scheduler level config. We can bring it back if you feel strongly about it.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202652

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202665
---


Ship it!




Master (805a53f) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> c6163bbabc7e7748f167b679893a93f58e4ef1ac 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> d37e7a07e9258bc8c0758bf50aece5b79025126b 
>   
> src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
>  770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202652
---



Looks good, mostly minor feedback.


src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java
Lines 35 (patched)


Missing javadoc.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 64 (patched)


Missing javadoc.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 149 (patched)


Delete blank line.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 194-195 (patched)


Would be good to add comments (either here or in the javadoc) explaining 
the use of locks per coordinator-URL.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 206 (patched)


To avoid having to dive into logs to see who is blocking host maintenance, 
should we export metrics for the job keys that are causing the errors too? Here 
and in the catch block below.



src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java
Lines 141 (patched)


You probably want multiple threads with a 1 minute (default) coordinator 
timeout and explicit locking, right?



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Lines 54-59 (original), 78-83 (patched)


This (or the Impl) needs updated with a high-level overview of the new 
logic (the async nature of maintenance, etc.)



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Lines 110 (patched)


I'm wondering if defaultSlaPolicy needs to be supplied by the client, or 
configured on the server? Is there a situation where client would ever supply 
different values depending on the situation?



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Line 131 (original), 192 (patched)


Revert to single-line.



src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
Lines 407 (patched)


I'm struggling to understand how maintenace request presence is related to 
the SLA. 

Isn't there a race condition where any hosts in DRAINING when the Scheduler 
first starts up (after this patch) would have no maintenance requests? Also, 
doesn't the maintenance request stay present *until* the SLA is satisfied?

This may be another argument for having a default policy in the server.


- David McLaughlin


On May 8, 2018, 3:49 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 3:49 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202657
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> c6163bbabc7e7748f167b679893a93f58e4ef1ac 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> d37e7a07e9258bc8c0758bf50aece5b79025126b 
>   
> src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
>  770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202651
---



Master (805a53f) is red with this patch.
  ./build-support/jenkins/build.sh

:startScripts
:distTar
:distZip
:assemble
:compileTestJavaNote: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:processTestResources
:testClasses
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources NO-SOURCE
:jmhClasses
:checkstyleJmh
:checkstyleMain[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/config/CliOptions.java:42:
 Wrong order for 'org.apache.aurora.scheduler.maintenance.MaintenanceModule' 
import. Order should be: java, javax, scala, com, net, org. Each group 
should be separated by a single blank line. [ImportOrder]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java:1:
 Line does not match expected header line of '^\/\*\*$'. [RegexpHeader]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java:45:
 Wrong order for 
'org.apache.aurora.scheduler.maintenance.MaintenanceController' import. Order 
should be: java, javax, scala, com, net, org. Each group should be 
separated by a single blank line. [ImportOrder]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java:18:8:
 Unused import - java.util.Collection. [UnusedImports]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java:163:
 '.' have incorrect indentation level 6, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java:164:
 '.' have incorrect indentation level 6, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java:165:
 '.' have incorrect indentation level 6, expected level should be 8. 
[Indentation]
[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java:84:
 Wrong order for 
'org.apache.aurora.scheduler.maintenance.MaintenanceController' import. Order 
should be: java, javax, scala, com, net, org. Each group should be 
separated by a single blank line. [ImportOrder]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleMain'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 5m 20s
32 actionable tasks: 26 executed, 6 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/
---

(Updated May 8, 2018, 8:49 a.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.


Changes
---

Style fixes.


Repository: aurora


Description
---

`Tasks` can specify custom SLA requirements as part of
their `TaskConfig`. One of the new features is the ability
to specify an external coordinator that can ACK/NACK
maintenance requests for tasks. This will be hugely
beneficial for onboarding services that cannot satisfactorily
specify SLA in terms of running instances.

Maintenance requests are driven from the Scheduler to
improve management of nodes in the cluster.


Note to reviewers:
- Test coverage is minimal at this point. Expect more coverage soon in the next 
diff.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
ffc07443fae9e5216a5333ae305f75aa9b452a0c 
  src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
  src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
3b4df55a05873e79aae206b117cbc753fa3abb94 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
25ed474289f369e74c24e999ad97ed6810c9fd5e 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
f58c66aaebe8d31913d67a05add0f3d6054e88d1 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
63c338e5bbdf60de0fba8d68c6613904abb93fa8 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
e66ec116112df164106598d9ff0bc9e8f465e44f 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
c6163bbabc7e7748f167b679893a93f58e4ef1ac 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 144-150 (patched)
> > 
> >
> > If bound in a private module, you don't need to have a `@Named` 
> > annotation (I don't think it is used anywhere else).
> > 
> > Additionally, maybe use `@Qualifier` instead (e.g. 
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L60-L62).
> >  Not entirely sure what the difference is myself but this is used more 
> > throughout the project.

I would prefer namespacing the argument even if they are inside a private 
module. I will use `@Qualifier` that looks cleaner.


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Line 157 (original), 203 (patched)
> > 
> >
> > This name confused me a bit. For me, it implies there will be a 
> > long-running method continually doing something but this is a one-time 
> > thing.
> > 
> > Maybe rename to `checkForDrainingTasks` or something?

Done


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 153-155 (patched)
> > 
> >
> > Do we need to immediately return after this?
> > 
> > If I am going to force the work to be done, do I need to even call this 
> > method?

I want to invoke the required in one-place so it is easier to follow.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202363
---


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 7, 2018, 2:12 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 124-130 (patched)
> > 
> >
> > I believe that you will want to only look at the lastest task event 
> > only if it is RUNNING.
> > 
> > This will take into account tasks that are KILLING/PREEMPTING which 
> > would break SLA.

Changed it so that we only look for RUNNING tasks when computing running time 
SLA. This will be more conservative than how we are doing SLA calculations in 
the admin client.


> On May 7, 2018, 2:12 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 138 (patched)
> > 
> >
> > Can you elaborate more on what this line does?

This should change to exclude the task that is currently under consideration to 
simulate the SLA without that task. Updated.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202594
---


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 3, 2018, 3:26 p.m., Reza Motamedi wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 248 (patched)
> > 
> >
> > Should the comment be updated to match the struct in resolution?
> > 
> > Ms is milli secs, right?

Done.


> On May 3, 2018, 3:26 p.m., Reza Motamedi wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 1241 (patched)
> > 
> >
> > s/of the host/of the hosts

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202386
---


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> 

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/
---

(Updated May 8, 2018, 8:28 a.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.


Changes
---

Refactor to separate into different modules and update more interface 
requirements for SLAManager.


Repository: aurora


Description
---

`Tasks` can specify custom SLA requirements as part of
their `TaskConfig`. One of the new features is the ability
to specify an external coordinator that can ACK/NACK
maintenance requests for tasks. This will be hugely
beneficial for onboarding services that cannot satisfactorily
specify SLA in terms of running instances.

Maintenance requests are driven from the Scheduler to
improve management of nodes in the cluster.


Note to reviewers:
- Test coverage is minimal at this point. Expect more coverage soon in the next 
diff.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
ffc07443fae9e5216a5333ae305f75aa9b452a0c 
  src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
  src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
3b4df55a05873e79aae206b117cbc753fa3abb94 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
25ed474289f369e74c24e999ad97ed6810c9fd5e 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
f58c66aaebe8d31913d67a05add0f3d6054e88d1 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
63c338e5bbdf60de0fba8d68c6613904abb93fa8 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
e66ec116112df164106598d9ff0bc9e8f465e44f 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
c6163bbabc7e7748f167b679893a93f58e4ef1ac 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f