Re: Review Request 53610: Added health checks documentation.

Neil Conway Tue, 22 Nov 2016 12:35:06 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53610/#review156617
-----------------------------------------------------------





docs/health-checks.md (line 9)
<https://reviews.apache.org/r/53610/#comment226792>

    "e.g.," not "e.g."



docs/health-checks.md (line 12)
<https://reviews.apache.org/r/53610/#comment226794>

    Rather than saying "health check their tasks out-of-band", I'd say:
    
    "some frameworks implement their own logic for checking the health of their 
tasks. This is typically done by having the framework scheduler send a "ping" 
request (e.g., via HTTP) to the host where the task is running and arranging 
for the task or executor to respond to the ping."
    
    "though" => "Although"



docs/health-checks.md (line 19)
<https://reviews.apache.org/r/53610/#comment226795>

    The phrase "incorporating network failures in health check information is 
not always desirable" is vague. What is the specific concern here?



docs/health-checks.md (line 21)
<https://reviews.apache.org/r/53610/#comment226796>

    Isn't a major advantage of Mesos-native health checks is that you avoid the 
scalability problems of having a single scheduler handle the health checks for 
a potentially large number of tasks?



docs/health-checks.md (line 23)
<https://reviews.apache.org/r/53610/#comment226807>

    I think this would benefit from some more discussion of the high-level 
architecture of Mesos-native health checks. For example:
    
    * the traditional "scheduler health check" pattern involves a single 
scheduler node and a collection of agents; Mesos-native health checks run on 
the agent. This improves scalability but means that detecting network faults is 
a separate concern.
    * when a task fails Mesos-native health checks, what happens to it? how 
does the framework scheduler learn about this?
    * what happens if a task is running on a partitioned agent -- will it still 
be health-checked? If those health-checks fail, will the task be terminated?
    
    Some of this is discussed below, but I think it would be better to briefly 
discuss it at the beginning of the document to set context for what follows.



docs/health-checks.md (line 26)
<https://reviews.apache.org/r/53610/#comment226797>

    s/, as well as provides/. Mesos 1.2.0 also provides/



docs/health-checks.md (line 27)
<https://reviews.apache.org/r/53610/#comment226798>

    "implementations for"



docs/health-checks.md (line 33)
<https://reviews.apache.org/r/53610/#comment226799>

    "This technique allows detecting and reporting process crashes, but ..."



docs/health-checks.md (line 46)
<https://reviews.apache.org/r/53610/#comment226800>

    s/nor/or/



docs/health-checks.md (line 56)
<https://reviews.apache.org/r/53610/#comment226801>

    "to honor the `HealthCheck` field in `TaskInfo`"
    
    I'd also strike "and to implement health checks" as redundant.



docs/health-checks.md (line 58)
<https://reviews.apache.org/r/53610/#comment226802>

    "the reference implementation for"



docs/health-checks.md (line 65)
<https://reviews.apache.org/r/53610/#comment226806>

    "The command is" -> "A command health check specifies an arbitrary command 
that is used to validate the health of the task. The executor launches the 
command and inspects its exit status: `0` is treated as success (the task is 
healthy), while any other exit status interpreted to mean the task is 
unhealthy."



docs/health-checks.md (line 98)
<https://reviews.apache.org/r/53610/#comment226808>

    "e.g.,"



docs/health-checks.md (line 202)
<https://reviews.apache.org/r/53610/#comment226809>

    Can we elaborate here -- that means a task that has failed health checks 
will typically be `RUNNING` with `healthy == false`? Is it possible to see 
other task states where the `health` field is set to false?



docs/health-checks.md (line 206)
<https://reviews.apache.org/r/53610/#comment226810>

    "all unhealthy status updates"
    
    "as well as the first healthy update"
    
    "i.e., when the task has started, or after one or more unhealthy updates 
have occurred"



docs/health-checks.md (line 208)
<https://reviews.apache.org/r/53610/#comment226811>

    /opt for/use/



docs/health-checks.md (line 254)
<https://reviews.apache.org/r/53610/#comment226814>

    I wouldn't use an exclamation point here.



docs/health-checks.md (line 263)
<https://reviews.apache.org/r/53610/#comment226815>

    "large value"



docs/health-checks.md (line 264)
<https://reviews.apache.org/r/53610/#comment226816>

    'introduce a "global" policy'



docs/health-checks.md (line 267)
<https://reviews.apache.org/r/53610/#comment226817>

    Why do they have to listen on all interfaces? i.e., listening on 127.0.0.1 
as well as whatever service interface/address they require should be 
sufficient, no?


- Neil Conway


On Nov. 20, 2016, 6:52 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53610/
> -----------------------------------------------------------
> 
> (Updated Nov. 20, 2016, 6:52 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, haosdent huang, Neil Conway, and 
> Till Toenshoff.
> 
> 
> Bugs: MESOS-5597
>     https://issues.apache.org/jira/browse/MESOS-5597
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   docs/health-checks.md PRE-CREATION 
>   docs/home.md a5811480de050352dca6c0f7e4e64d3d2351c2d5 
> 
> Diff: https://reviews.apache.org/r/53610/diff/
> 
> 
> Testing
> -------
> 
> https://gist.github.com/rukletsov/7200c36b2fd1e81f78f2583e68b31fd1
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>

Re: Review Request 53610: Added health checks documentation.

Reply via email to