> On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > > Lines 263 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006757#file2006757line263> > > > > Is this per request? Will end users be able to do partial ACKs?
Turned out to be not very useful when using locks. We can revisit if this is needed. Dropping this now. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Line 131 (original), 148-149 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line152> > > > > nit: make a single line (looks like it could fit, but ignore if it > > actually can't) Done. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 213-217 (original), 221-240 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line241> > > > > Does it make sense to/is it possible to combine these? E.g. force > > drains would call with a defaultSlaPolicy of none or something I was really trying to avoid touching the old code. But have since changed my mind and refactored it to unify logic. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 214-216 (original), 222-224 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line242> > > > > Should this also create `HostMaintenanceRequest` for consistency? I was trying not to touch the current flow at all. However the refactor addressed this. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 231 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line251> > > > > What is the behavior if there is an existing host maintenance request? > > Do we deny the new one or overwrite the old one? There can be only one maintenance request per host at any time. So we will simply overwrite it. If the current one is actively handled, the new one will just be a no-op. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 318-326 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line338> > > > > Can we make this `@Timed` as well? Done. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 325 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line345> > > > > Why is `checkSla` always `true`? I was trying to differentiate `drain` vs `slaDrain`. Refactored out the old logic to use `HostMaintenanceRequest` as well. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 328-331 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line348> > > > > Should this be customizable? I am pretty sure it is fine if it is not, > > but wanted to call it out. Sure. Done. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 337-340 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line357> > > > > Seems odd to have `if not checkSla, call slaManager to drain the task` > > > > maybe slaManager should behave like a consumer, executing the > > statechange only if SLA check passes? Refactored SlaManager with the interface agreed upon below, and now we have a single `checkSlaThenAct` as the API into `SlaManager`. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 342-345 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006759#file2006759line362> > > > > This should never happen if `checkSla` is `true`, correct? > > > > Could we add a metric here as well? Done. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java > > Lines 60-61 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006760#file2006760line60> > > > > I think that this makes more sense in `MaintenanceController` Agree. Done. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java > > Lines 100 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006760#file2006760line100> > > > > mostly for my curiosity but why 100? This should be the max number of coordinators that can exist in a cluster. Should be parametrized. > On April 20, 2018, 3:31 p.m., Jordan Ly wrote: > > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java > > Lines 148-174 (patched) > > <https://reviews.apache.org/r/66716/diff/1/?file=2006760#file2006760line148> > > > > Mentioned above, but I think that the concept of `drain` should remain > > in `MaintenanceController` and not leak into `SlaManager`. > > > > Instead of `drain`, this could be something like `checkSlaAndExecute` > > which would be a consumer that checks the SLA then executes an action > > provided by the caller. I am thinking about this in terms of my proposed > > SLA-aware updates -- I would be able to utilize this interface as well. Discussed offline a lot about the interface to make it flexible for SLA-aware updates and we decided to go with a `checkSlaThenAct` which takes a `Storage.MutateWork` argument. - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66716/#review201642 ----------------------------------------------------------- On May 1, 2018, 2:19 p.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66716/ > ----------------------------------------------------------- > > (Updated May 1, 2018, 2:19 p.m.) > > > Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > `Tasks` can specify custom SLA requirements as part of > their `TaskConfig`. One of the new features is the ability > to specify an external coordinator that can ACK/NACK > maintenance requests for tasks. This will be hugely > beneficial for onboarding services that cannot satisfactorily > specify SLA in terms of running instances. > > Maintenance requests are driven from the Scheduler to > improve management of nodes in the cluster. > > > Note to reviewers: > - Test coverage is minimal at this point. Expect more coverage soon in the > next diff. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > ef754e32172e7490a47a13e7b526f243ffa3efeb > api/src/main/thrift/org/apache/aurora/gen/storage.thrift > b79e2045ccda05d5058565f81988dfe33feea8f1 > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > f58c66aaebe8d31913d67a05add0f3d6054e88d1 > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/state/StateModule.java > 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 > src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/Storage.java > da5534f886e032ca5a182f3704aa335ff680b258 > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > f1fdc275d3958a36bbe79110d70dfeba640a948a > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java > 10864f122eff5027c88d835baae6de483d960218 > > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java > 8d70cae35289a9e36142bab288cf0c9398ebd2d4 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java > b30de881eafa3226fdc32383b4e9bfd33ca912a5 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java > 4b52be02001e704f4b1a5f447226ac8c2386e3fd > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java > 9f324b010db7e351e98b257d8fc8fecfeac81268 > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java > edcea09b4d206cfddb642074237b031ad71cff13 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b > src/main/python/apache/aurora/config/schema/base.py > a629bcd1261e5959da0a8458a55545d4e2c2a7a5 > src/main/python/apache/aurora/config/thrift.py > 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 > src/main/python/apache/aurora/executor/executor_vars.py > 561f9452aedda4cc695c84a2a850bdd7e1d65dec > src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java > 778148a7c033cba9004954cabc33a2b1d003dccf > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > e66ec116112df164106598d9ff0bc9e8f465e44f > > src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java > 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 > src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java > ba03ff94bb5fee2b09a6660a9ad759cece7449f1 > > src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java > 3dd9ce4039b223cb6156462d089f7062a1cde772 > > src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java > 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java > be07361a27afefa21cc2ba76ce82531a418d9814 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > d59118be13342da9003b0bcb97e12e477d9edf8f > > src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java > d412090c292691305f01bccd1596fb0f6bb003ad > src/test/python/apache/aurora/api_util.py > 3fc9b478cc9aada0503e8ed8698a37b4ed926cdd > src/test/python/apache/aurora/client/api/test_scheduler_client.py > f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b > src/test/python/apache/aurora/client/cli/test_inspect.py > e4f43d0573c7862adc9bc679f4cea40cc76eac38 > src/test/python/apache/aurora/config/test_thrift.py > 8e1d0e177959af12b97bdd1cd47845b72bc12fe1 > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob > 88e1c36a1aa2d192b95963f7aa36e243a447e4af > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate > 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks > 4323031ec6bd128576c2a43ebc11f04a9f046e2f > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66716/diff/2/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > > > Thanks, > > Santhosh Kumar Shanmugham > >
