> On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java > > Lines 35 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018105#file2018105line35> > > > > Missing javadoc.
Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java > > Lines 64 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018107#file2018107line64> > > > > Missing javadoc. Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java > > Lines 149 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018107#file2018107line149> > > > > Delete blank line. Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java > > Lines 194-195 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018107#file2018107line194> > > > > Would be good to add comments (either here or in the javadoc) > > explaining the use of locks per coordinator-URL. Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java > > Lines 206 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018107#file2018107line206> > > > > To avoid having to dive into logs to see who is blocking host > > maintenance, should we export metrics for the job keys that are causing the > > errors too? Here and in the catch block below. Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java > > Lines 141 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018108#file2018108line142> > > > > You probably want multiple threads with a 1 minute (default) > > coordinator timeout and explicit locking, right? Yes, that makes sense. I will reuse the number of parallel coordinator transaction as the thread pool limit. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 54-59 (original), 78-83 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018109#file2018109line79> > > > > This (or the Impl) needs updated with a high-level overview of the new > > logic (the async nature of maintenance, etc.) Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 110 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018109#file2018109line111> > > > > I'm wondering if defaultSlaPolicy needs to be supplied by the client, > > or configured on the server? Is there a situation where client would ever > > supply different values depending on the situation? As of today dedicated clusters can supply their own SLA requirements. Setting a global SLA requirement (which can be less stricter) can be a surprise for operators. But my initial approach was to set a server default as well. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Line 131 (original), 192 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018109#file2018109line193> > > > > Revert to single-line. Done. > On May 8, 2018, 9:30 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > > Lines 407 (patched) > > <https://reviews.apache.org/r/66716/diff/4/?file=2018109#file2018109line421> > > > > I'm struggling to understand how maintenace request presence is related > > to the SLA. > > > > Isn't there a race condition where any hosts in DRAINING when the > > Scheduler first starts up (after this patch) would have no maintenance > > requests? Also, doesn't the maintenance request stay present *until* the > > SLA is satisfied? > > > > This may be another argument for having a default policy in the server. After this patch has been rolled out, this should never happen. Added this to log (we can change the message) and export metric to account for exactly the situation you mentioned while rolling out the patch (for debugging). It is safe to ingore drains like this after the scheduler startup, since all drains are driven from clients today. So the clients will retry the drain -> which will create the maintenance request object -> which will unblock the drains. Considering the `dedicated` owners user-case (mentioned above), I dropped the scheduler level config. We can bring it back if you feel strongly about it. - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66716/#review202652 ----------------------------------------------------------- On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66716/ > ----------------------------------------------------------- > > (Updated May 8, 2018, 8:49 a.m.) > > > Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > `Tasks` can specify custom SLA requirements as part of > their `TaskConfig`. One of the new features is the ability > to specify an external coordinator that can ACK/NACK > maintenance requests for tasks. This will be hugely > beneficial for onboarding services that cannot satisfactorily > specify SLA in terms of running instances. > > Maintenance requests are driven from the Scheduler to > improve management of nodes in the cluster. > > > Note to reviewers: > - Test coverage is minimal at this point. Expect more coverage soon in the > next diff. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > ef754e32172e7490a47a13e7b526f243ffa3efeb > api/src/main/thrift/org/apache/aurora/gen/storage.thrift > b79e2045ccda05d5058565f81988dfe33feea8f1 > src/main/java/org/apache/aurora/scheduler/app/AppModule.java > ffc07443fae9e5216a5333ae305f75aa9b452a0c > src/main/java/org/apache/aurora/scheduler/config/CliOptions.java > a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 > > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java > 3b4df55a05873e79aae206b117cbc753fa3abb94 > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION > src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java > 25ed474289f369e74c24e999ad97ed6810c9fd5e > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java > f58c66aaebe8d31913d67a05add0f3d6054e88d1 > src/main/java/org/apache/aurora/scheduler/state/StateModule.java > 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 > src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/Storage.java > da5534f886e032ca5a182f3704aa335ff680b258 > > src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java > f1fdc275d3958a36bbe79110d70dfeba640a948a > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java > 10864f122eff5027c88d835baae6de483d960218 > > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java > 8d70cae35289a9e36142bab288cf0c9398ebd2d4 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java > b30de881eafa3226fdc32383b4e9bfd33ca912a5 > src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java > 4b52be02001e704f4b1a5f447226ac8c2386e3fd > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java > 9f324b010db7e351e98b257d8fc8fecfeac81268 > src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java > edcea09b4d206cfddb642074237b031ad71cff13 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > 9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b > src/main/python/apache/aurora/config/schema/base.py > a629bcd1261e5959da0a8458a55545d4e2c2a7a5 > src/main/python/apache/aurora/config/thrift.py > 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 > src/main/python/apache/aurora/executor/executor_vars.py > 561f9452aedda4cc695c84a2a850bdd7e1d65dec > src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java > 63c338e5bbdf60de0fba8d68c6613904abb93fa8 > src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java > 778148a7c033cba9004954cabc33a2b1d003dccf > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > e66ec116112df164106598d9ff0bc9e8f465e44f > > src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java > c6163bbabc7e7748f167b679893a93f58e4ef1ac > src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java > d37e7a07e9258bc8c0758bf50aece5b79025126b > > src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java > 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 > src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java > ba03ff94bb5fee2b09a6660a9ad759cece7449f1 > > src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java > 3dd9ce4039b223cb6156462d089f7062a1cde772 > > src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java > 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 > > src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java > be07361a27afefa21cc2ba76ce82531a418d9814 > > src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java > d59118be13342da9003b0bcb97e12e477d9edf8f > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > 2cf66d8154ad3795989ee9026e45af1be509f244 > src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java > 40851c419e4d62e6545959eebc0ce144fdecc697 > > src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java > d412090c292691305f01bccd1596fb0f6bb003ad > src/test/python/apache/aurora/api_util.py > 3fc9b478cc9aada0503e8ed8698a37b4ed926cdd > src/test/python/apache/aurora/client/api/test_scheduler_client.py > f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b > src/test/python/apache/aurora/client/cli/test_inspect.py > e4f43d0573c7862adc9bc679f4cea40cc76eac38 > src/test/python/apache/aurora/config/test_thrift.py > 8e1d0e177959af12b97bdd1cd47845b72bc12fe1 > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob > 88e1c36a1aa2d192b95963f7aa36e243a447e4af > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate > 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks > 4323031ec6bd128576c2a43ebc11f04a9f046e2f > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest > PRE-CREATION > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66716/diff/4/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > > > Thanks, > > Santhosh Kumar Shanmugham > >
