> On Sept. 8, 2016, 9:13 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py, lines 204-211
> > <https://reviews.apache.org/r/51742/diff/1/?file=1494643#file1494643line204>
> >
> >     I see a couple of problems with this method:
> >     
> >     1. The `status_result` argument passed in by `StatusManager#run` is not 
> > a `TaskState`, it's a 
> > [StatusResult](https://github.com/apache/aurora/blob/master/src/main/python/apache/aurora/executor/common/status_checker.py#L23-L50)
> >  which has a `status` property whose value is a `TaskState`. So I believe 
> > your first check will always fail in a real world scenario?
> >     2. If we receive multiple invocations of this callback for healthy 
> > tasks, the second callback will result in the healthy task being killed 
> > (since `self.task_running` will be `True`, the initial condition will 
> > evaluate to `False` and we'll land in the shutdown branch).
> >     
> >     These two issues combined make me wary of shipping these changes 
> > without the corresponding changes further down the stack, as it makes it 
> > hard to evaluate their actual effect. I appreciate trying to break the 
> > review up into manageable chunks, but in this case I think it's having the 
> > opposite effect of causing reviewers to have to envision the contents of 
> > future reviews to understand the impact of the changes in this review. On 
> > top of that, the doubling up of sending the `TASK_RUNNING` update while the 
> > code is in the in-between state also makes me nervous.
> >     
> >     We also definitely need test coverage for this callback.
> 
> Kai Huang wrote:
>     Thanks for the valid feedback. I'll work on them.
> 
> Kai Huang wrote:
>     Given that _status_result_handler is an instance method of Aurora 
> executor, the unit test of _status_result_handler is kind of interwined in 
> the following tests in test_thermos_executor.py:
>     test_shutdown
>     test_task_health_check_ok
>     test_task_health_failed, etc.
>     
>     The complexity results from the attribute "task_running" in Aurora 
> Executor, which makes it hard to unit test this function. I'll think about if 
> there is other way than relying on the "task_running" field in 
> aurora_executor.

Since the major blocker here is that we don't want to trigger the callback to 
handle TASK_RUNNING status more than once, I suggest delegating the "check 
once" logic from Aurora Executor into status manager, because that will make 
the unit tests much easier.


- Kai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51742/#review148258
-----------------------------------------------------------


On Sept. 8, 2016, 8:07 p.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51742/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2016, 8:07 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Bugs: AURORA-1225
>     https://issues.apache.org/jira/browse/AURORA-1225
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Modify the callback function passed to StatusManager to handle more 
> StatusResult.
> 
> Currently, Aurora executor passes a callback to a status manager. The status 
> manager periodically polls the status of HealthChecker. When a non-empty 
> status is deteted, the callback function will shut down the status manager. 
> 
> In health check driven updates, the health checker will change its status to 
> "TASK_RUNNING" when a successful required number of health checks is reached. 
> Therefore, we need to modify the callback function, so that it won't shutdown 
> the status manager if the task is in non-exit state like "TASK_RUNNING".
> 
> Please see this epic: https://issues.apache.org/jira/browse/AURORA-1225
> and the design doc: 
> https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>  for more details and background.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> ce5ef680f01831cd89fced8969ae3246c7f60cfd 
>   src/main/python/apache/aurora/executor/status_manager.py 
> 228a99a05f339e21cd7e769a42b9b2276e7bc3fc 
>   src/test/python/apache/aurora/executor/test_status_manager.py 
> ce4679ba1aa7b42cf0115c943d84663030182d23 
> 
> Diff: https://reviews.apache.org/r/51742/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>

Reply via email to