>Not sure what was going on with health-checks in 0.24.0. 0.24.1 should be works.
>Do any of you know which host the path "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" should exist on? It definitely doesn't exist on the slave, hence execution failing. Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same dir of mesos-docker-executor. On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> wrote: > Maybe I spoke too soon. > > Now the checks are attempting to run, however the STDERR is not looking > good. I've added some debugging to the error message output to show the > path, argv, and envp variables: > > STDOUT: > > --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >> --stop_timeout="0ns" >> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >> --stop_timeout="0ns" >> Registered docker executor on mesos-worker2a >> Starting task >> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >> Launching health check process: >> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >> --executor=(1)@192.168.225.59:43917 >> --health_check_json={"command":{"shell":true,"value":"docker exec >> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >> sh -c \" exit 1 >> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >> Health check process launched at pid: 3012 > > > STDERR: > > I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave >> 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >> WARNING: Your kernel does not support swap limit capabilities, memory >> limited without swap. >> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain >> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >> envp=''): No such file or directory*** Aborted at 1444270649 (unix time) >> try "date -d @1444270649" if you are using GNU date *** >> PC: @ 0x7f4a37ec6cc9 (unknown) >> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID >> 3012; stack trace: *** >> @ 0x7f4a38265340 (unknown) >> @ 0x7f4a37ec6cc9 (unknown) >> @ 0x7f4a37eca0d8 (unknown) >> @ 0x4191e2 _Abort() >> @ 0x41921c _Abort() >> @ 0x7f4a39dc2768 process::childMain() >> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >> @ 0x7f4a39dc24fc process::defaultClone() >> @ 0x7f4a39dc34fb process::subprocess() >> @ 0x43cc9c >> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >> @ 0x7f4a39d924f4 process::ProcessManager::resume() >> @ 0x7f4a39d92827 >> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >> @ 0x7f4a38a47e40 (unknown) >> @ 0x7f4a3825d182 start_thread >> @ 0x7f4a37f8a47d (unknown) > > > Do any of you know which host the path > "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" > should exist on? It definitely doesn't exist on the slave, hence > execution failing. > > This is with current master, git hash > 5058fac1083dc91bca54d33c26c810c17ad95dd1. > > commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >> Author: Anand Mazumdar <[email protected]> >> Date: Tue Oct 6 17:37:41 2015 -0700 > > > -Jay > > On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> wrote: > >> Update: >> >> I used https://github.com/deric/mesos-deb-packaging to compile and >> package the latest master (0.26.x) and deployed it to the cluster, and now >> health checks are working as advertised in both Marathon and my own >> framework! Not sure what was going on with health-checks in 0.24.0.. >> >> Anyways, thanks again for your help Haosdent! >> >> Cheers, >> Jay >> >> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> wrote: >> >>> Hi Haosdent, >>> >>> Can you share your Marathon POST request that results in Mesos executing >>> the health checks? >>> >>> Since we can reference the Marathon framework, I've been doing some >>> digging around. >>> >>> Here are the details of my setup and findings: >>> >>> I put a few small hacks in Marathon: >>> >>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies >>> >>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X >>> in both the TaskFactory as well an right before the task is sent to Mesos >>> via driver.launchTasks: >>> >>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>> >>> $ git diff >>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>> >>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>> config).buildIfMatches(offer, runningTasks).map { >>>> case (taskInfo, ports) => >>>> + import com.googlecode.protobuf.format.JsonFormat >>>> + import java.io._ >>>> + val bw = new BufferedWriter(new FileWriter(new >>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>> + bw.write("\n") >>>> + bw.close() >>>> CreatedTask( >>>> taskInfo, >>>> MarathonTasks.makeTask( >>> >>> >>> >>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>> >>> $ git diff >>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>> override def launchTasks(offerID: OfferID, taskInfos: >>>> Seq[TaskInfo]): Boolean = { >>>> val launched = withDriver(s"launchTasks($offerID)") { driver => >>>> import scala.collection.JavaConverters._ >>>> + var i = 0 >>>> + for (i <- 0 to taskInfos.length - 1) { >>>> + import com.googlecode.protobuf.format.JsonFormat >>>> + import java.io._ >>>> + val file = new File("/tmp/taskJson2-" + i.toString() + "-" + >>>> taskInfos(i).getTaskId.getValue) >>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>> + bw.write("\n") >>>> + bw.close() >>>> + } >>>> driver.launchTasks(Collections.singleton(offerID), >>>> taskInfos.asJava) >>>> } >>> >>> >>> Then I built and deployed the hacked Marathon and restarted the marathon >>> service. >>> >>> Next I created the app via the Marathon API ("hello app" is a container >>> with a simple hello-world ruby app running on 0.0.0.0:8000) >>> >>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: >>>> application/json' -d' >>>> { >>>> "id": "/app-81-1-hello-app", >>>> "apps": [ >>>> { >>>> "id": "/app-81-1-hello-app/web-v11", >>>> "container": { >>>> "type": "DOCKER", >>>> "docker": { >>>> "image": >>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>> "network": "BRIDGE", >>>> "portMappings": [ >>>> { >>>> "containerPort": 8000, >>>> "hostPort": 0, >>>> "protocol": "tcp" >>>> } >>>> ] >>>> } >>>> }, >>>> "env": { >>>> >>>> }, >>>> "healthChecks": [ >>>> { >>>> "protocol": "COMMAND", >>>> "command": {"value": "exit 1"}, >>>> "gracePeriodSeconds": 10, >>>> "intervalSeconds": 10, >>>> "timeoutSeconds": 10, >>>> "maxConsecutiveFailures": 3 >>>> } >>>> ], >>>> "instances": 1, >>>> "cpus": 1, >>>> "mem": 512 >>>> } >>>> ] >>>> } >>> >>> >>> $ ls /tmp/ >>>> >>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>> >>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>> >>> >>> Do they match? >>> >>> $ md5sum /tmp/task* >>>> 1b5115997e78e2611654059249d99578 >>>> >>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>> 1b5115997e78e2611654059249d99578 >>>> >>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>> >>> >>> Yes, so I am confident this is the information being sent across the >>> wire to Mesos. >>> >>> Do they contain any health-check information? >>> >>> $ cat >>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>> { >>>> "name":"web-v11.app-81-1-hello-app", >>>> "task_id":{ >>>> >>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>> }, >>>> "slave_id":{ >>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>> }, >>>> "resources":[ >>>> { >>>> "name":"cpus", >>>> "type":"SCALAR", >>>> "scalar":{ >>>> "value":1.0 >>>> }, >>>> "role":"*" >>>> }, >>>> { >>>> "name":"mem", >>>> "type":"SCALAR", >>>> "scalar":{ >>>> "value":512.0 >>>> }, >>>> "role":"*" >>>> }, >>>> { >>>> "name":"ports", >>>> "type":"RANGES", >>>> "ranges":{ >>>> "range":[ >>>> { >>>> "begin":31641, >>>> "end":31641 >>>> } >>>> ] >>>> }, >>>> "role":"*" >>>> } >>>> ], >>>> "command":{ >>>> "environment":{ >>>> "variables":[ >>>> { >>>> "name":"PORT_8000", >>>> "value":"31641" >>>> }, >>>> { >>>> "name":"MARATHON_APP_VERSION", >>>> "value":"2015-10-07T19:35:08.386Z" >>>> }, >>>> { >>>> "name":"HOST", >>>> "value":"mesos-worker1a" >>>> }, >>>> { >>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>> >>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>> }, >>>> { >>>> "name":"MESOS_TASK_ID", >>>> >>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>> }, >>>> { >>>> "name":"PORT", >>>> "value":"31641" >>>> }, >>>> { >>>> "name":"PORTS", >>>> "value":"31641" >>>> }, >>>> { >>>> "name":"MARATHON_APP_ID", >>>> "value":"/app-81-1-hello-app/web-v11" >>>> }, >>>> { >>>> "name":"PORT0", >>>> "value":"31641" >>>> } >>>> ] >>>> }, >>>> "shell":false >>>> }, >>>> "container":{ >>>> "type":"DOCKER", >>>> "docker":{ >>>> >>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>> "network":"BRIDGE", >>>> "port_mappings":[ >>>> { >>>> "host_port":31641, >>>> "container_port":8000, >>>> "protocol":"tcp" >>>> } >>>> ], >>>> "privileged":false, >>>> "force_pull_image":false >>>> } >>>> } >>>> } >>> >>> >>> No, I don't see anything about any health check. >>> >>> Mesos STDOUT for the launched task: >>> >>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>> --logbufsecs="0" --logging_level="INFO" >>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>> --stop_timeout="0ns" >>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>> --logbufsecs="0" --logging_level="INFO" >>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>> --stop_timeout="0ns" >>>> Registered docker executor on mesos-worker1a >>>> Starting task >>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>> >>> >>> And STDERR: >>> >>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on slave >>>> 20150924-210922-1608624320-5050-1792-S1 >>>> WARNING: Your kernel does not support swap limit capabilities, memory >>>> limited without swap. >>> >>> >>> Again, nothing about any health checks. >>> >>> Any ideas of other things to try or what I could be missing? Can't say >>> either way about the Mesos health-check system working or not if Marathon >>> won't put the health-check into the task it sends to Mesos. >>> >>> Thanks for all your help! >>> >>> Best, >>> Jay >>> >>> >>> >>>> >>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> wrote: >>> >>>> Maybe you could post your executor stdout/stderr so that we could know >>>> whether health check running not. >>>> >>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> wrote: >>>> >>>>> marathon also use mesos health check. When I use health check, I could >>>>> saw the log like this in executor stdout. >>>>> >>>>> ``` >>>>> Registered docker executor on xxxxx >>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>> Launching health check process: >>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx >>>>> Health check process launched at pid: 9895 >>>>> Received task health update, healthy: true >>>>> ``` >>>>> >>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]> >>>>> wrote: >>>>> >>>>>> I am using my own framework, and the full task info I'm using is >>>>>> posted earlier in this thread. Do you happen to know if Marathon uses >>>>>> Mesos's health checks for its health check system? >>>>>> >>>>>> >>>>>> >>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote: >>>>>> >>>>>> Yes, launch the health task through its definition in taskinfo. Do >>>>>> you launch your task through Marathon? I could test it in my side. >>>>>> >>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Precisely, and there are none of those statements. Are you or >>>>>>> others confident health-checks are part of the code path when defined >>>>>>> via >>>>>>> task info for docker container tasks? Going through the code, I wasn't >>>>>>> able to find the linkage for anything other than health-checks triggered >>>>>>> through a custom executor. >>>>>>> >>>>>>> With that being said it is a pretty good sized code base and I'm not >>>>>>> very familiar with it, so my analysis this far has by no means been >>>>>>> exhaustive. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote: >>>>>>> >>>>>>> When health check launch, it would have a log like this in your >>>>>>> executor stdout >>>>>>> ``` >>>>>>> Health check process launched at pid xxx >>>>>>> ``` >>>>>>> >>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I'm happy to try this, however wouldn't there be output in the logs >>>>>>>> with the string "health" or "Health" if the health-check were active? >>>>>>>> None >>>>>>>> of my master or slave logs contain the string.. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> wrote: >>>>>>>> >>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could >>>>>>>> see unhealthy status in your task stdout/stderr. >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> My current version is 0.24.1. >>>>>>>>> >>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>> >>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>> Are you use one of this version? >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me >>>>>>>>>>> double check. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll look >>>>>>>>>>>> there :) >>>>>>>>>>>> >>>>>>>>>>>> Thanks again! >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>> >>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out? >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>> >>>>>>>>>>>>>> We just added health check support for docker tasks that's in >>>>>>>>>>>>>> master but not yet released. It will run docker exec with the >>>>>>>>>>>>>> command you >>>>>>>>>>>>>> provided as health checks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Tim >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks? >>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>> >>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I have searched all machines and containers to see if they >>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not >>>>>>>>>>>>>> found any >>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from >>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask. >>>>>>>>>>>>>> Does this >>>>>>>>>>>>>> mean that health-checks are only supported for custom executors >>>>>>>>>>>>>> and not for >>>>>>>>>>>>>> docker tasks? >>>>>>>>>>>>>> >>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero >>>>>>>>>>>>>> exit-status of a health-check command translate to task health. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> Jay >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> Haosdent Huang >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Haosdent Huang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Haosdent Huang >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Haosdent Huang >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >> > -- Best Regards, Haosdent Huang

