> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > end-to-end test for this feature?

I think that the integration test should be sufficient for this feature 
(similar to coordinated updates which has fairly comprehensive integration 
tests but no e2e tests).


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > docs/reference/scheduler-configuration.md
> > Lines 162 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054721#file2054721line162>
> >
> >     s/The number of/The maximum number of/
> >     
> >     This is only used for SLA-aware updates, should we name it so that they 
> > are grouped together with the other settings?

Good point, done.


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > docs/reference/scheduler-configuration.md
> > Lines 227-230 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054721#file2054721line227>
> >
> >     `sla_aware_kill_retry_min_delay` will be more readable. Same for the 
> > max delay.

Done.


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java
> > Lines 56 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054728#file2054728line56>
> >
> >     s/SLA passes/SLA passes or update is cancelled/

Added "or the update changes state." which is clarified in the next sentence.


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java
> > Lines 140 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054728#file2054728line140>
> >
> >     Repeating my comment from internal review - 
> >     
> >     Technically this is can be any kind of work. Should we call this class 
> > `SlaAwareInstanceUpdater`? Since this has more to do with actually update 
> > logic than kill itself. And perhaps the method can become 
> > `startSlaAwareUpdate`?
> >     
> >     The job of this class is around checking safety and scheduling the 
> > supplied work to be evaulated. Naming the class based on it usage rather 
> > than its function seems weird to me.

This would be ideal but I think there are some small parts of the logic that 
changing the name would be misleading towards.

For example:

```
        storeProvider
            .getTaskStore()
            .fetchTask(taskId)
            .filter(task -> isKillable(task.getStatus()))
            .ifPresent(task -> {
              incrementJobStatCounter(killAttemptsByJob, SLA_KILL_ATTEMPT, 
instance.getJobKey());
              slaManager.checkSlaThenAct(
                  task,
                  instructions.getDesiredState().getTask().getSlaPolicy(),
                  slaStoreProvider -> performKill(
                      slaStoreProvider,
                      instance,
                      key,
                      status,
                      killCommand),
                  ImmutableMap.of(),
                  // If the task is not assigned, force the update since it 
does not affect the
                  // SLA. For example, if a task is THROTTLED or PENDING, we 
probably don't care
                  // if the update replaces it with a new instance.
                  !SLAVE_ASSIGNED_STATES.contains(task.getStatus()));
```

Within the SLA aware code, we have kill-specific checks in order to determine 
if we should continue with the action (or if it should be a NOOP), or if we 
should force the kill regardless (the task would not affect SLA). Changing the 
name would require a refactoring for a more complex interface but I am not sure 
if we would ever use it for anything else.

I am leaning towards keeping the current name and functionality right now until 
another sla-aware update use case comes in and we have a better idea on how we 
can refactor the current abstraction (from kill to generic, which I don't think 
should be too difficult but I'm sure I'll regret those words...)


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
> > Lines 78 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054739#file2054739line78>
> >
> >     test case when sla kill is a noop (task is already killed for a 
> > different reason) - is this already covered?

This is covered at the end of `testSlaKillRetry`.


> On July 9, 2018, 9:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
> > Lines 180 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054739#file2054739line180>
> >
> >     Drop `_` here and everywhere. `SLA_CHECKING` sounds like a valid event 
> > that is definied in code.

Can you elaborate on this? Did you mean drop `_MESSAGE`? I added `_MESSAGE` 
because it is not a true "event" in the sense that it piggybacks on 
`INSTANCE_UPDATING` or `INSTANCE_ROLLING_BACK` with a message showing progress.


- Jordan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review205866
-----------------------------------------------------------


On July 9, 2018, 6:20 p.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> -----------------------------------------------------------
> 
> (Updated July 9, 2018, 6:20 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -----
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 74ee1745e1fd7c308e2bdfa46aeb18a7ecfe14d2 
>   src/main/java/org/apache/aurora/scheduler/updater/Updates.java 
> f949fd54f524780672167e12fcadf268da08e679 
>   src/main/python/apache/aurora/client/api/updater_util.py 
> 4e3986220aaa4c9b138394b962120b176185af12 
>   src/main/python/apache/aurora/config/schema/base.py 
> 7baded79acdf863670afc183d740dcad602490c2 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> dcf58896f1e866c0369ba1b78060236e98d9d46b 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java
>  3a06a451da4ef3acccb33b5495b9fae141557148 
>   
> src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
> 0aea369d8a8f75291de9691b6d61f3d48895507c 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> 43f857d893a54e19e71b36f2f06fef3a3ef6e874 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 5667a1b59681a6de87149d7161d760bff5da3818 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 2c27ec7136ff22a3570a4ec278c73f7ee310f628 
>   
> src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java 
> PRE-CREATION 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 2baba2aa55865ec298a4c9e5af3952b56cb9a910 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobInstanceUpdateEvent
>  48902d39b9d2cbeae1a52180669aba8349e4dd65 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
>  08dfa5b9a67989083a0d405ce8100698a4d096ae 
>   ui/src/main/js/components/UpdateInstanceEvents.js 
> 8351f2c4c256e27625d94b70842be0e91065a551 
>   ui/src/main/js/components/UpdateSettings.js 
> d756f5916fd8c39dcbb5578ee5eda198f807f458 
> 
> 
> Diff: https://reviews.apache.org/r/67696/diff/5/
> 
> 
> Testing
> -------
> 
> Added unit tests, `./gradlew test`.
> 
> Tested at scale with over 10,000 SLA-aware instance update events occuring 
> concurrently. Scheduler stability did not seem to be affected.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>

Reply via email to