> On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > LGTM! Especially thanks for the thorough documentation. > > > > I have just a bunch of questions below. Feel free to address those as you > > see fit. I am away for the next few days I don't want to block the patch > > any further.
Thanks for the review Stephan! Much appreciated. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > docs/features/sla-requirements.md > > Lines 115 (patched) > > <https://reviews.apache.org/r/66716/diff/10/?file=2027918#file2027918line115> > > > > If we are unsure about some details of the interface, we could consider > > calling it experimental for now. Good point. I will call it out as experimental. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > docs/features/sla-requirements.md > > Lines 130 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2028985#file2028985line130> > > > > Having a GET request with a body is somewhat strange (see > > https://stackoverflow.com/questions/978061/http-get-with-request-body). I > > am wondering if we should make this a POST instead? I have been back-and-forth on this. I am taking your suggestion and changing this to POST with a body and no query params similar to `WebHooks`. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > docs/features/sla-requirements.md > > Lines 131 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2028985#file2028985line131> > > > > Do we really need the task parameter if everything is encoded in the > > json anyway? I was going to get rid of it, but then realized that the `cluster` is not really available via the `IScheduledTask` and is hidden inside the `ExecutorConfig` json string, which made it hard for me to parse it out for my vagrant example. This critical info is needed if a `Coordinator` needs to handle maintenance across multiple clusters. I will keep the `task` param in, since it can be a short-hand most customers. Only those who require additional information (such as the `hostname` and `ports` info as you asked earlier) need to dive into the body to figure those out. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > examples/vagrant/systemd/aurora-scheduler.service > > Lines 63 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2028989#file2028989line63> > > > > Is this change related to the patch? Apparently, we have a situation where we have to wait for the next offer-cycle after an `aurora_admin host_activate` to get tasks to schedule on it. Something to do with `HostOffers`. I am not a 100% certain on the issue, but the effect is consistent. This is added to ensure that the `sla-aware maintenance` `end-to-end test` does not have to wait a full 5-min to get tasks to run again on the only agent in the vagrant cluster. I will add comment to explain this. I will open a JIRA to track this as well. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java > > Lines 237 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2028992#file2028992line238> > > > > The `preferred` tier exists in the default `tiers.json` but might not > > be available in general. > > > > Maybe just say that "Tier X does not support SlaPolicy". Done. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java > > Lines 395-406 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2028995#file2028995line395> > > > > From a perspective of "single level of abstraction" it looks weird that > > `askCoordinatorThenAct` handles many exceptions but not the > > `InterruptedException`. > > > > I guess you had a reason for that. What am I missing? One of the previous diffs used a timeout on acquiring the coordinator locks. This is irrelevant now dropping it. I will go over the changes once again and resolve any intellj suggested improvements. I was flying with the `check-style` in `gradle`, apparently it is not enough. > On May 25, 2018, 8:50 a.m., Stephan Erb wrote: > > src/main/python/apache/aurora/admin/host_maintenance.py > > Line 173 (original), 173 (patched) > > <https://reviews.apache.org/r/66716/diff/11/?file=2029002#file2029002line173> > > > > Should we already call out the old maintenance mechanism as deprecated > > in the RELEASE-NOTES? My thinking was to replace the `host_drain` logic with the one under `sla_host_drain` instead. So we will deprecate `sla_host_drain` in the next release and replace the logic for `host_drain`? Putting `host_drain` under deprecation, nevertheless. We can discuss this further when you are back and happy to address any concerns. - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66716/#review203807 ----------------------------------------------------------- On May 25, 2018, 2:14 p.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66716/ > ----------------------------------------------------------- > > (Updated May 25, 2018, 2:14 p.m.) > > > Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and > Stephan Erb. > > > Bugs: AURORA-1978 > https://issues.apache.org/jira/browse/AURORA-1978 > > > Repository: aurora > > > Description > ------- > > `Tasks` can specify custom SLA requirements as part of > their `TaskConfig`. One of the new features is the ability > to specify an external coordinator that can ACK/NACK > maintenance requests for tasks. This will be hugely > beneficial for onboarding services that cannot satisfactorily > specify SLA in terms of running instances. > > Maintenance requests are driven from the Scheduler to > improve management of nodes in the cluster. > > > Diffs > ----- > > RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 > api/src/main/thrift/org/apache/aurora/gen/api.thrift > ff48000d613ceef3e03586b94944d13275fb127c > docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 > docs/features/sla-requirements.md PRE-CREATION > docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee > docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 > docs/reference/scheduler-configuration.md > a659cfac974059b04ef5593286011decbb7f9110 > examples/vagrant/systemd/aurora-scheduler.service > 57e4bba858672f8da94eaa0499f8e5f3347ab982 > src/main/java/org/apache/aurora/scheduler/app/AppModule.java > ffc07443fae9e5216a5333ae305f75aa9b452a0c > src/main/java/org/apache/aurora/scheduler/config/CliOptions.java > a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 > > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java > 4073229b74d0e0e7fd31552bd96894ceb8a0971a > > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java > 3b4df55a05873e79aae206b117cbc753fa3abb94 > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java > 25ed474289f369e74c24e999ad97ed6810c9fd5e > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > f58c66aaebe8d31913d67a05add0f3d6054e88d1 > src/main/java/org/apache/aurora/scheduler/state/StateModule.java > 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 > src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java > e88cad6cf12312512e6840329db7ca7134ceaae6 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b > src/main/python/apache/aurora/admin/admin_util.py > 8240e8093160623b4c30dd212a88b8e122fd9856 > src/main/python/apache/aurora/admin/host_maintenance.py > 83fc2b6ece40d3436cc7de7a034f95224235fcfd > src/main/python/apache/aurora/admin/maintenance.py > 942a237f47a6e0416bbaf244278685477e0f407d > src/main/python/apache/aurora/client/api/__init__.py > f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 > src/main/python/apache/aurora/client/cli/context.py > 06b194114a7f44a61943e0932973e71b53f239b4 > src/main/python/apache/aurora/client/cli/jobs.py > 536d04a21d32c4e586dc943a6f9b0ad0143354a3 > src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java > 63c338e5bbdf60de0fba8d68c6613904abb93fa8 > src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java > 8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > e66ec116112df164106598d9ff0bc9e8f465e44f > > src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java > 749ffeac6cb851f32bba7606390203d7a046a0e6 > src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java > 0fabb3370713e57d417adbd2af9e24a4d515c60a > src/test/java/org/apache/aurora/scheduler/cron/quartz/QuartzTestUtil.java > b7dcf3af366c9def63165dc9bef998ab5e95ed49 > > src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java > c6163bbabc7e7748f167b679893a93f58e4ef1ac > src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java > d37e7a07e9258bc8c0758bf50aece5b79025126b > > src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java > 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 2cf66d8154ad3795989ee9026e45af1be509f244 > src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java > 40851c419e4d62e6545959eebc0ce144fdecc697 > > src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java > d412090c292691305f01bccd1596fb0f6bb003ad > src/test/python/apache/aurora/admin/test_maintenance.py > ca0239b157f9f9053821af0328b9448703386cd4 > src/test/python/apache/aurora/api_util.py > 3fc9b478cc9aada0503e8ed8698a37b4ed926cdd > src/test/python/apache/aurora/client/api/test_scheduler_client.py > f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b > src/test/python/apache/aurora/client/cli/test_add.py > b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 > src/test/python/apache/aurora/client/cli/test_kill.py > 0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e > src/test/sh/org/apache/aurora/e2e/http_example.py > ba7d11429b5f3945a1fdf1808105b11e6ef78420 > src/test/sh/org/apache/aurora/e2e/partition_aware.aurora > 7ea9fadefcb4846cfe4922e11febec74c75f15db > src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION > src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > 888efe4e990913d81335f1f3e2c9b6473de7bee8 > ui/src/main/js/components/TaskConfigSummary.js > 64880f4bd5c5358287ef481df455f6355fedd7d6 > > > Diff: https://reviews.apache.org/r/66716/diff/12/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh > > > Thanks, > > Santhosh Kumar Shanmugham > >
