----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53610/#review156617 -----------------------------------------------------------
docs/health-checks.md (line 9) <https://reviews.apache.org/r/53610/#comment226792> "e.g.," not "e.g." docs/health-checks.md (line 12) <https://reviews.apache.org/r/53610/#comment226794> Rather than saying "health check their tasks out-of-band", I'd say: "some frameworks implement their own logic for checking the health of their tasks. This is typically done by having the framework scheduler send a "ping" request (e.g., via HTTP) to the host where the task is running and arranging for the task or executor to respond to the ping." "though" => "Although" docs/health-checks.md (line 19) <https://reviews.apache.org/r/53610/#comment226795> The phrase "incorporating network failures in health check information is not always desirable" is vague. What is the specific concern here? docs/health-checks.md (line 21) <https://reviews.apache.org/r/53610/#comment226796> Isn't a major advantage of Mesos-native health checks is that you avoid the scalability problems of having a single scheduler handle the health checks for a potentially large number of tasks? docs/health-checks.md (line 23) <https://reviews.apache.org/r/53610/#comment226807> I think this would benefit from some more discussion of the high-level architecture of Mesos-native health checks. For example: * the traditional "scheduler health check" pattern involves a single scheduler node and a collection of agents; Mesos-native health checks run on the agent. This improves scalability but means that detecting network faults is a separate concern. * when a task fails Mesos-native health checks, what happens to it? how does the framework scheduler learn about this? * what happens if a task is running on a partitioned agent -- will it still be health-checked? If those health-checks fail, will the task be terminated? Some of this is discussed below, but I think it would be better to briefly discuss it at the beginning of the document to set context for what follows. docs/health-checks.md (line 26) <https://reviews.apache.org/r/53610/#comment226797> s/, as well as provides/. Mesos 1.2.0 also provides/ docs/health-checks.md (line 27) <https://reviews.apache.org/r/53610/#comment226798> "implementations for" docs/health-checks.md (line 33) <https://reviews.apache.org/r/53610/#comment226799> "This technique allows detecting and reporting process crashes, but ..." docs/health-checks.md (line 46) <https://reviews.apache.org/r/53610/#comment226800> s/nor/or/ docs/health-checks.md (line 56) <https://reviews.apache.org/r/53610/#comment226801> "to honor the `HealthCheck` field in `TaskInfo`" I'd also strike "and to implement health checks" as redundant. docs/health-checks.md (line 58) <https://reviews.apache.org/r/53610/#comment226802> "the reference implementation for" docs/health-checks.md (line 65) <https://reviews.apache.org/r/53610/#comment226806> "The command is" -> "A command health check specifies an arbitrary command that is used to validate the health of the task. The executor launches the command and inspects its exit status: `0` is treated as success (the task is healthy), while any other exit status interpreted to mean the task is unhealthy." docs/health-checks.md (line 98) <https://reviews.apache.org/r/53610/#comment226808> "e.g.," docs/health-checks.md (line 202) <https://reviews.apache.org/r/53610/#comment226809> Can we elaborate here -- that means a task that has failed health checks will typically be `RUNNING` with `healthy == false`? Is it possible to see other task states where the `health` field is set to false? docs/health-checks.md (line 206) <https://reviews.apache.org/r/53610/#comment226810> "all unhealthy status updates" "as well as the first healthy update" "i.e., when the task has started, or after one or more unhealthy updates have occurred" docs/health-checks.md (line 208) <https://reviews.apache.org/r/53610/#comment226811> /opt for/use/ docs/health-checks.md (line 254) <https://reviews.apache.org/r/53610/#comment226814> I wouldn't use an exclamation point here. docs/health-checks.md (line 263) <https://reviews.apache.org/r/53610/#comment226815> "large value" docs/health-checks.md (line 264) <https://reviews.apache.org/r/53610/#comment226816> 'introduce a "global" policy' docs/health-checks.md (line 267) <https://reviews.apache.org/r/53610/#comment226817> Why do they have to listen on all interfaces? i.e., listening on 127.0.0.1 as well as whatever service interface/address they require should be sufficient, no? - Neil Conway On Nov. 20, 2016, 6:52 p.m., Alexander Rukletsov wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53610/ > ----------------------------------------------------------- > > (Updated Nov. 20, 2016, 6:52 p.m.) > > > Review request for mesos, Gastón Kleiman, haosdent huang, Neil Conway, and > Till Toenshoff. > > > Bugs: MESOS-5597 > https://issues.apache.org/jira/browse/MESOS-5597 > > > Repository: mesos > > > Description > ------- > > See summary. > > > Diffs > ----- > > docs/health-checks.md PRE-CREATION > docs/home.md a5811480de050352dca6c0f7e4e64d3d2351c2d5 > > Diff: https://reviews.apache.org/r/53610/diff/ > > > Testing > ------- > > https://gist.github.com/rukletsov/7200c36b2fd1e81f78f2583e68b31fd1 > > > Thanks, > > Alexander Rukletsov > >
