----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66716/#review202652 -----------------------------------------------------------
Looks good, mostly minor feedback. src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java Lines 35 (patched) <https://reviews.apache.org/r/66716/#comment284591> Missing javadoc. src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java Lines 64 (patched) <https://reviews.apache.org/r/66716/#comment284594> Missing javadoc. src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java Lines 149 (patched) <https://reviews.apache.org/r/66716/#comment284595> Delete blank line. src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java Lines 194-195 (patched) <https://reviews.apache.org/r/66716/#comment284596> Would be good to add comments (either here or in the javadoc) explaining the use of locks per coordinator-URL. src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java Lines 206 (patched) <https://reviews.apache.org/r/66716/#comment284597> To avoid having to dive into logs to see who is blocking host maintenance, should we export metrics for the job keys that are causing the errors too? Here and in the catch block below. src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java Lines 141 (patched) <https://reviews.apache.org/r/66716/#comment284599> You probably want multiple threads with a 1 minute (default) coordinator timeout and explicit locking, right? src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java Lines 54-59 (original), 78-83 (patched) <https://reviews.apache.org/r/66716/#comment284586> This (or the Impl) needs updated with a high-level overview of the new logic (the async nature of maintenance, etc.) src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java Lines 110 (patched) <https://reviews.apache.org/r/66716/#comment284582> I'm wondering if defaultSlaPolicy needs to be supplied by the client, or configured on the server? Is there a situation where client would ever supply different values depending on the situation? src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java Line 131 (original), 192 (patched) <https://reviews.apache.org/r/66716/#comment284583> Revert to single-line. src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java Lines 407 (patched) <https://reviews.apache.org/r/66716/#comment284589> I'm struggling to understand how maintenace request presence is related to the SLA. Isn't there a race condition where any hosts in DRAINING when the Scheduler first starts up (after this patch) would have no maintenance requests? Also, doesn't the maintenance request stay present *until* the SLA is satisfied? This may be another argument for having a default policy in the server. - David McLaughlin On May 8, 2018, 3:49 p.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66716/ > ----------------------------------------------------------- > > (Updated May 8, 2018, 3:49 p.m.) > > > Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > `Tasks` can specify custom SLA requirements as part of > their `TaskConfig`. One of the new features is the ability > to specify an external coordinator that can ACK/NACK > maintenance requests for tasks. This will be hugely > beneficial for onboarding services that cannot satisfactorily > specify SLA in terms of running instances. > > Maintenance requests are driven from the Scheduler to > improve management of nodes in the cluster. > > > Note to reviewers: > - Test coverage is minimal at this point. Expect more coverage soon in the > next diff. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > ef754e32172e7490a47a13e7b526f243ffa3efeb > api/src/main/thrift/org/apache/aurora/gen/storage.thrift > b79e2045ccda05d5058565f81988dfe33feea8f1 > src/main/java/org/apache/aurora/scheduler/app/AppModule.java > ffc07443fae9e5216a5333ae305f75aa9b452a0c > src/main/java/org/apache/aurora/scheduler/config/CliOptions.java > a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 > > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java > 3b4df55a05873e79aae206b117cbc753fa3abb94 > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java > 25ed474289f369e74c24e999ad97ed6810c9fd5e > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > f58c66aaebe8d31913d67a05add0f3d6054e88d1 > src/main/java/org/apache/aurora/scheduler/state/StateModule.java > 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 > src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/Storage.java > da5534f886e032ca5a182f3704aa335ff680b258 > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > f1fdc275d3958a36bbe79110d70dfeba640a948a > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java > 10864f122eff5027c88d835baae6de483d960218 > > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java > 8d70cae35289a9e36142bab288cf0c9398ebd2d4 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java > b30de881eafa3226fdc32383b4e9bfd33ca912a5 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java > 4b52be02001e704f4b1a5f447226ac8c2386e3fd > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java > 9f324b010db7e351e98b257d8fc8fecfeac81268 > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java > edcea09b4d206cfddb642074237b031ad71cff13 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b > src/main/python/apache/aurora/config/schema/base.py > a629bcd1261e5959da0a8458a55545d4e2c2a7a5 > src/main/python/apache/aurora/config/thrift.py > 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 > src/main/python/apache/aurora/executor/executor_vars.py > 561f9452aedda4cc695c84a2a850bdd7e1d65dec > src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java > 63c338e5bbdf60de0fba8d68c6613904abb93fa8 > src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java > 778148a7c033cba9004954cabc33a2b1d003dccf > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > e66ec116112df164106598d9ff0bc9e8f465e44f > > src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java > c6163bbabc7e7748f167b679893a93f58e4ef1ac > src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java > d37e7a07e9258bc8c0758bf50aece5b79025126b > > src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java > 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 > src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java > ba03ff94bb5fee2b09a6660a9ad759cece7449f1 > > src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java > 3dd9ce4039b223cb6156462d089f7062a1cde772 > > src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java > 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java > be07361a27afefa21cc2ba76ce82531a418d9814 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > d59118be13342da9003b0bcb97e12e477d9edf8f > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 2cf66d8154ad3795989ee9026e45af1be509f244 > src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java > 40851c419e4d62e6545959eebc0ce144fdecc697 > > src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java > d412090c292691305f01bccd1596fb0f6bb003ad > src/test/python/apache/aurora/api_util.py > 3fc9b478cc9aada0503e8ed8698a37b4ed926cdd > src/test/python/apache/aurora/client/api/test_scheduler_client.py > f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b > src/test/python/apache/aurora/client/cli/test_inspect.py > e4f43d0573c7862adc9bc679f4cea40cc76eac38 > src/test/python/apache/aurora/config/test_thrift.py > 8e1d0e177959af12b97bdd1cd47845b72bc12fe1 > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob > 88e1c36a1aa2d192b95963f7aa36e243a447e4af > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate > 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks > 4323031ec6bd128576c2a43ebc11f04a9f046e2f > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66716/diff/4/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > > > Thanks, > > Santhosh Kumar Shanmugham > >
