----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/55901/#review164919 -----------------------------------------------------------
Fix it, then Ship it! LGTM modulo possible collision with the previous check on timeout. Please confirm such collision is not possible. src/checks/health_checker.cpp (lines 338 - 342) <https://reviews.apache.org/r/55901/#comment236747> How about `checkResult = commandCheckViaAgent ? nestedCommandHealthCheck() : commandHealthCheck();` ? src/checks/health_checker.cpp (line 555) <https://reviews.apache.org/r/55901/#comment236753> s/until/until after/ ? src/checks/health_checker.cpp (lines 565 - 567) <https://reviews.apache.org/r/55901/#comment236752> This should probably be indented, no? src/checks/health_checker.cpp (lines 575 - 580) <https://reviews.apache.org/r/55901/#comment236757> After thinking some time I think I understand why we treat any HTTP status code other than 200 as health check failure. I think it still makes sense to capture this in a comment explaining that if the agent cannot launch a nested container for _any_ reason, it is considered a health check failure. We should also mention this when we update the documentaiton. src/checks/health_checker.cpp (line 615) <https://reviews.apache.org/r/55901/#comment236759> s/this future is completed//we finish processing current timeout./ It's not clear what "this future" means. src/checks/health_checker.cpp (lines 623 - 628) <https://reviews.apache.org/r/55901/#comment236762> I think what you're trying to say is that it is fine to complete the future returned from this call if `waitForNestedContainer` fails, because 1) debug containers are not checkpointed and 2) `waitForNestedContainer` fails iff agent fails as well hence cleaning up the container. Could you please rephrase the comment mentioning both your expectation (no wait retry is fine) and reasoning? src/checks/health_checker.cpp (line 635) <https://reviews.apache.org/r/55901/#comment236774> Could you please specify what this function returns? Status or exit code? src/checks/health_checker.cpp (lines 653 - 654) <https://reviews.apache.org/r/55901/#comment236765> Why not `defer`? src/checks/health_checker.cpp (lines 654 - 670) <https://reviews.apache.org/r/55901/#comment236763> I would extract this lambda into a separate callback for readability. src/checks/health_checker.cpp (line 657) <https://reviews.apache.org/r/55901/#comment236771> Maybe saying "Agent returned" instead of "Received"? src/checks/health_checker.cpp (line 666) <https://reviews.apache.org/r/55901/#comment236764> please `CHECK` that `response->has_wait_nested_container()`. src/checks/health_checker.cpp (lines 666 - 670) <https://reviews.apache.org/r/55901/#comment236766> ``` return ( response->wait_nested_container().has_exit_status() ? response->wait_nested_container().exit_status() : None()); ``` src/checks/health_checker.cpp (line 667) <https://reviews.apache.org/r/55901/#comment236770> What do you get from the agent call: status or exit code directly? It's probably not that important here, but it is for checks. - Alexander Rukletsov On Feb. 8, 2017, 1:27 p.m., Gastón Kleiman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/55901/ > ----------------------------------------------------------- > > (Updated Feb. 8, 2017, 1:27 p.m.) > > > Review request for mesos, Alexander Rukletsov, Anand Mazumdar, haosdent > huang, and Vinod Kone. > > > Bugs: MESOS-6280 > https://issues.apache.org/jira/browse/MESOS-6280 > > > Repository: mesos > > > Description > ------- > > Added support for command health checks to the default executor. > > > Diffs > ----- > > src/checks/health_checker.hpp 95da1ff7dd6b222a93076633eb3757ec9aa43cf6 > src/checks/health_checker.cpp 58380dc18896f659aa9c4fb4bb567a55bba97f6b > src/launcher/default_executor.cpp e63cf153831088851863d0956455a024e9bc172a > src/tests/health_check_tests.cpp 7b6a803a28b2e4f6c27e9a0c4f668350ec2d5a81 > > Diff: https://reviews.apache.org/r/55901/diff/ > > > Testing > ------- > > Introduced a new test: `HealthCheckTest.DefaultExecutorCmdHealthCheck`. It > passes on Linux, but not on macOS. > > > Thanks, > > Gastón Kleiman > >