> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 143
> > <https://reviews.apache.org/r/51536/diff/1/?file=1488791#file1488791line143>
> >
> >     We should add a constant to api.thrift for `health` and use it here as 
> > well as in the executor.
> 
> Stephan Erb wrote:
>     We need to look for a more general solution here. The presence of a 
> health port does not account for the potential existence of a 
> `.healthchecksnooze` file disabling health checks.
> 
> Kai Huang wrote:
>     Thanks, that's a good point. We could combine this logic into "executor 
> capabilities" as Joshua mentioned.

I disagree that we should account for the `.healthchecksnooze` for this 
feature. This approach has always been considered extremely risky and "proceed 
at your own risk" way to debug end user issues. If someone drops a 
`.healthchecksnooze` _after_ task entring `STARTING` and _before_ it reaches 
`RUNNING` I believe the following should be enough to address such case (from 
design doc):

> 2. Instance fails to respond OK and the initial_interval_secs expires ? task 
> moves into FAILED.


> On Aug. 30, 2016, 9:25 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java, 
> > line 130
> > <https://reviews.apache.org/r/51536/diff/1/?file=1488791#file1488791line130>
> >
> >     Doing this based simply on the presence of a string feels brittle to 
> > me. Would it make sense to define some basic form of "executor 
> > capabilities" passing? Even if it's something simple like a comma delimited 
> > list of constants that we define in api.thrift (so that both the scheduler 
> > and the executor can reuse the values)?
> 
> Kai Huang wrote:
>     Do you mean adding executor capabilities as one attribute of TaskEvent in 
> api.thrift? 
>     
>     I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler.
> 
> Zameer Manji wrote:
>     "I think on the executor side , we can put (isHealthCheckEnabled() && 
> isExecutorUpdated()) into one attribute called "watch_sec_skippable", and 
> send it to the scheduler."
>     
>     We currently have no communication via the executor and scheduler outside 
> of status updates. If you want to do this, please discuss this idea on the 
> mailing list. The design doc makes no reference to executor capabilities or 
> any communication between the executor and scheduler.

Joshua, how did you see the "executor capabilities" working here? As Zameer 
pointed out, there is currently no way for us to communicate real-time executor 
capabilities to the scheduler. I don't think adding this type of info to every 
status update justifies the amount of work and additional storage space that 
would require. Besides, we'd have to deprecate this capability and remove from 
the schema in the next release or two. 

I agree relying on the presence of the message may appear brittle at first but 
as far as I can see it's the most straightforward approach that will not 
require any special handling or deprecation followups. We are in full control 
of the executor and it's guaranteed through code the message will remain absent 
for any V-n versions running out there.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51536/#review147358
-----------------------------------------------------------


On Aug. 30, 2016, 8:52 p.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51536/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2016, 8:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> - Scheduler updater will not use watch_sec if health check is enabled
> - Add unit tests for successful updates.
> 
> This feature intends to improve reliability and performance of the Aurora 
> scheduler job updater by relying on health check status rather than 
> watch_secs timeout when deciding an individual instance update state. 
> 
> See this epic: https://issues.apache.org/jira/browse/AURORA-894 
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Assumptions on executor change:
> The executor starts health check at STARTING, if a successful health check is 
> performed before initial_interval_sec expires, the executor will sends a 
> status message for RUNNING.
> 
> What I try to implement:
> vCurrent updater will infer the executor version from the presence of a 
> status message sent by the executor during RUNNING status update. 
> 
> For vPrev executor:
>     The vCurrent updater will use watch_sec for instance update, this is 
> independent from the health check.
> For vCurrent executor:
>     If health check is disabled on vCurrent executor, the vCurrent updater 
> will use watch_sec for instance update.
>     If health check is enabled on vCurrent executor, the vCurrent updater 
> will not use watch_sec for instance update.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
> 
> Diff: https://reviews.apache.org/r/51536/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> ./gradlew :test --tests 
> "org.apache.aurora.scheduler.updater.InstanceUpdaterTest"
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>

Reply via email to