> On July 9, 2018, 2:04 p.m., Santhosh Kumar Shanmugham wrote: > > src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java > > Lines 140 (patched) > > <https://reviews.apache.org/r/67696/diff/5/?file=2054728#file2054728line140> > > > > Repeating my comment from internal review - > > > > Technically this is can be any kind of work. Should we call this class > > `SlaAwareInstanceUpdater`? Since this has more to do with actually update > > logic than kill itself. And perhaps the method can become > > `startSlaAwareUpdate`? > > > > The job of this class is around checking safety and scheduling the > > supplied work to be evaulated. Naming the class based on it usage rather > > than its function seems weird to me. > > Jordan Ly wrote: > This would be ideal but I think there are some small parts of the logic > that changing the name would be misleading towards. > > For example: > > ``` > storeProvider > .getTaskStore() > .fetchTask(taskId) > .filter(task -> isKillable(task.getStatus())) > .ifPresent(task -> { > incrementJobStatCounter(killAttemptsByJob, > SLA_KILL_ATTEMPT, instance.getJobKey()); > slaManager.checkSlaThenAct( > task, > instructions.getDesiredState().getTask().getSlaPolicy(), > slaStoreProvider -> performKill( > slaStoreProvider, > instance, > key, > status, > killCommand), > ImmutableMap.of(), > // If the task is not assigned, force the update since > it does not affect the > // SLA. For example, if a task is THROTTLED or PENDING, > we probably don't care > // if the update replaces it with a new instance. > !SLAVE_ASSIGNED_STATES.contains(task.getStatus())); > ``` > > Within the SLA aware code, we have kill-specific checks in order to > determine if we should continue with the action (or if it should be a NOOP), > or if we should force the kill regardless (the task would not affect SLA). > Changing the name would require a refactoring for a more complex interface > but I am not sure if we would ever use it for anything else. > > I am leaning towards keeping the current name and functionality right now > until another sla-aware update use case comes in and we have a better idea on > how we can refactor the current abstraction (from kill to generic, which I > don't think should be too difficult but I'm sure I'll regret those words...)
These checks can be moved into the `killCommand`. I think the abstraction is not perfect as it is today. > On July 9, 2018, 2:04 p.m., Santhosh Kumar Shanmugham wrote: > > src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java > > Lines 180 (patched) > > <https://reviews.apache.org/r/67696/diff/5/?file=2054739#file2054739line180> > > > > Drop `_` here and everywhere. `SLA_CHECKING` sounds like a valid event > > that is definied in code. > > Jordan Ly wrote: > Can you elaborate on this? Did you mean drop `_MESSAGE`? I added > `_MESSAGE` because it is not a true "event" in the sense that it piggybacks > on `INSTANCE_UPDATING` or `INSTANCE_ROLLING_BACK` with a message showing > progress. Can use `SLA_CHECKING_MESSAGE` or simply `sla checking event is added`. I started searching for an event type of `SLA_CHECKING` after reading this comment, which is a little confusing. - Santhosh Kumar ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67696/#review205866 ----------------------------------------------------------- On July 9, 2018, 7:22 p.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67696/ > ----------------------------------------------------------- > > (Updated July 9, 2018, 7:22 p.m.) > > > Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, > Santhosh Kumar Shanmugham, and Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This patch enables SLA-aware updates. > > Following https://reviews.apache.org/r/66716/, tasks may now specify custom > SLA policies that will be respected by the scheduler during maintenance. This > patch integrates into the same system to allow users to specify if they want > their updates to also respect SLA. Please see > https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing > for a more detailed description. > > This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` > and `message` to `JobInstanceUpdateEvent`. These should be forward and > backwards compatible. > > > Diffs > ----- > > RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca > api/src/main/thrift/org/apache/aurora/gen/api.thrift > 7265b11103aa12743c42355163ae64e98e965d7f > docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 > docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 > docs/reference/scheduler-configuration.md > 805e516689be019101f7c220c89fd9c391bb93b3 > src/main/java/org/apache/aurora/scheduler/base/Tasks.java > 2e13aacf576e648d9fffe989e4fc05c8954e72d8 > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java > 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d > > src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java > 9fa68b2dd55b4e4f5436356c1b94af1393967679 > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java > a002d955c3bc7b7c39da5e130e8c10c536bdcebd > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > ec577cccb86914ebd679ca235103f79dd7e7b79d > src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java > f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a > src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java > PRE-CREATION > src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java > 74ee1745e1fd7c308e2bdfa46aeb18a7ecfe14d2 > src/main/java/org/apache/aurora/scheduler/updater/Updates.java > f949fd54f524780672167e12fcadf268da08e679 > src/main/python/apache/aurora/client/api/updater_util.py > 4e3986220aaa4c9b138394b962120b176185af12 > src/main/python/apache/aurora/config/schema/base.py > 7baded79acdf863670afc183d740dcad602490c2 > src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java > dcf58896f1e866c0369ba1b78060236e98d9d46b > > src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java > 3a06a451da4ef3acccb33b5495b9fae141557148 > > src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java > 0aea369d8a8f75291de9691b6d61f3d48895507c > src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java > 43f857d893a54e19e71b36f2f06fef3a3ef6e874 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 5667a1b59681a6de87149d7161d760bff5da3818 > src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java > 2c27ec7136ff22a3570a4ec278c73f7ee310f628 > > src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java > PRE-CREATION > src/test/python/apache/aurora/client/cli/test_inspect.py > 2baba2aa55865ec298a4c9e5af3952b56cb9a910 > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobInstanceUpdateEvent > 48902d39b9d2cbeae1a52180669aba8349e4dd65 > > src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate > 08dfa5b9a67989083a0d405ce8100698a4d096ae > ui/src/main/js/components/UpdateInstanceEvents.js > 8351f2c4c256e27625d94b70842be0e91065a551 > ui/src/main/js/components/UpdateSettings.js > d756f5916fd8c39dcbb5578ee5eda198f807f458 > > > Diff: https://reviews.apache.org/r/67696/diff/7/ > > > Testing > ------- > > Added unit tests, `./gradlew test`. > > Tested at scale with over 10,000 SLA-aware instance update events occuring > concurrently. Scheduler stability did not seem to be affected. > > > Thanks, > > Jordan Ly > >