>Not sure what was going on with health-checks in 0.24.0.
0.24.1 should be works.

>Do any of you know which host the path
"/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
should exist on? It definitely doesn't exist on the slave, hence execution
failing.

Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
dir of mesos-docker-executor.

On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> wrote:

> Maybe I spoke too soon.
>
> Now the checks are attempting to run, however the STDERR is not looking
> good.  I've added some debugging to the error message output to show the
> path, argv, and envp variables:
>
> STDOUT:
>
> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --stop_timeout="0ns"
>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker2a
>> Starting task
>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>> Launching health check process:
>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>> --executor=(1)@192.168.225.59:43917
>> --health_check_json={"command":{"shell":true,"value":"docker exec
>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>> sh -c \" exit 1
>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>> Health check process launched at pid: 3012
>
>
> STDERR:
>
> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>> WARNING: Your kernel does not support swap limit capabilities, memory
>> limited without swap.
>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>> try "date -d @1444270649" if you are using GNU date ***
>> PC: @ 0x7f4a37ec6cc9 (unknown)
>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>> 3012; stack trace: ***
>> @ 0x7f4a38265340 (unknown)
>> @ 0x7f4a37ec6cc9 (unknown)
>> @ 0x7f4a37eca0d8 (unknown)
>> @ 0x4191e2 _Abort()
>> @ 0x41921c _Abort()
>> @ 0x7f4a39dc2768 process::childMain()
>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>> @ 0x7f4a39dc24fc process::defaultClone()
>> @ 0x7f4a39dc34fb process::subprocess()
>> @ 0x43cc9c
>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>> @ 0x7f4a39d92827
>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>> @ 0x7f4a38a47e40 (unknown)
>> @ 0x7f4a3825d182 start_thread
>> @ 0x7f4a37f8a47d (unknown)
>
>
> Do any of you know which host the path 
> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
> should exist on? It definitely doesn't exist on the slave, hence
> execution failing.
>
> This is with current master, git hash
> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>
> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>> Author: Anand Mazumdar <[email protected]>
>> Date:   Tue Oct 6 17:37:41 2015 -0700
>
>
> -Jay
>
> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> wrote:
>
>> Update:
>>
>> I used https://github.com/deric/mesos-deb-packaging to compile and
>> package the latest master (0.26.x) and deployed it to the cluster, and now
>> health checks are working as advertised in both Marathon and my own
>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>
>> Anyways, thanks again for your help Haosdent!
>>
>> Cheers,
>> Jay
>>
>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> wrote:
>>
>>> Hi Haosdent,
>>>
>>> Can you share your Marathon POST request that results in Mesos executing
>>> the health checks?
>>>
>>> Since we can reference the Marathon framework, I've been doing some
>>> digging around.
>>>
>>> Here are the details of my setup and findings:
>>>
>>> I put a few small hacks in Marathon:
>>>
>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>
>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>> via driver.launchTasks:
>>>
>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>
>>> $ git diff
>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>
>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>        case (taskInfo, ports) =>
>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>> +        import java.io._
>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>> +        bw.write("\n")
>>>> +        bw.close()
>>>>          CreatedTask(
>>>>            taskInfo,
>>>>            MarathonTasks.makeTask(
>>>
>>>
>>>
>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>
>>> $ git diff
>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>> Seq[TaskInfo]): Boolean = {
>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>        import scala.collection.JavaConverters._
>>>> +      var i = 0
>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>> +        import java.io._
>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>> taskInfos(i).getTaskId.getValue)
>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>> +        bw.write("\n")
>>>> +        bw.close()
>>>> +      }
>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>> taskInfos.asJava)
>>>>      }
>>>
>>>
>>> Then I built and deployed the hacked Marathon and restarted the marathon
>>> service.
>>>
>>> Next I created the app via the Marathon API ("hello app" is a container
>>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>
>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>> application/json' -d'
>>>> {
>>>>   "id": "/app-81-1-hello-app",
>>>>   "apps": [
>>>>     {
>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>       "container": {
>>>>         "type": "DOCKER",
>>>>         "docker": {
>>>>           "image":
>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>           "network": "BRIDGE",
>>>>           "portMappings": [
>>>>             {
>>>>               "containerPort": 8000,
>>>>               "hostPort": 0,
>>>>               "protocol": "tcp"
>>>>             }
>>>>           ]
>>>>         }
>>>>       },
>>>>       "env": {
>>>>
>>>>       },
>>>>       "healthChecks": [
>>>>         {
>>>>           "protocol": "COMMAND",
>>>>           "command": {"value": "exit 1"},
>>>>           "gracePeriodSeconds": 10,
>>>>           "intervalSeconds": 10,
>>>>           "timeoutSeconds": 10,
>>>>           "maxConsecutiveFailures": 3
>>>>         }
>>>>       ],
>>>>       "instances": 1,
>>>>       "cpus": 1,
>>>>       "mem": 512
>>>>     }
>>>>   ]
>>>> }
>>>
>>>
>>> $ ls /tmp/
>>>>
>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> Do they match?
>>>
>>> $ md5sum /tmp/task*
>>>> 1b5115997e78e2611654059249d99578
>>>>  
>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>> 1b5115997e78e2611654059249d99578
>>>>  
>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> Yes, so I am confident this is the information being sent across the
>>> wire to Mesos.
>>>
>>> Do they contain any health-check information?
>>>
>>> $ cat
>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>> {
>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>   "task_id":{
>>>>
>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>   },
>>>>   "slave_id":{
>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>   },
>>>>   "resources":[
>>>>     {
>>>>       "name":"cpus",
>>>>       "type":"SCALAR",
>>>>       "scalar":{
>>>>         "value":1.0
>>>>       },
>>>>       "role":"*"
>>>>     },
>>>>     {
>>>>       "name":"mem",
>>>>       "type":"SCALAR",
>>>>       "scalar":{
>>>>         "value":512.0
>>>>       },
>>>>       "role":"*"
>>>>     },
>>>>     {
>>>>       "name":"ports",
>>>>       "type":"RANGES",
>>>>       "ranges":{
>>>>         "range":[
>>>>           {
>>>>             "begin":31641,
>>>>             "end":31641
>>>>           }
>>>>         ]
>>>>       },
>>>>       "role":"*"
>>>>     }
>>>>   ],
>>>>   "command":{
>>>>     "environment":{
>>>>       "variables":[
>>>>         {
>>>>           "name":"PORT_8000",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_VERSION",
>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>         },
>>>>         {
>>>>           "name":"HOST",
>>>>           "value":"mesos-worker1a"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>
>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>         },
>>>>         {
>>>>           "name":"MESOS_TASK_ID",
>>>>
>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>         },
>>>>         {
>>>>           "name":"PORT",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"PORTS",
>>>>           "value":"31641"
>>>>         },
>>>>         {
>>>>           "name":"MARATHON_APP_ID",
>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>         },
>>>>         {
>>>>           "name":"PORT0",
>>>>           "value":"31641"
>>>>         }
>>>>       ]
>>>>     },
>>>>     "shell":false
>>>>   },
>>>>   "container":{
>>>>     "type":"DOCKER",
>>>>     "docker":{
>>>>
>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>       "network":"BRIDGE",
>>>>       "port_mappings":[
>>>>         {
>>>>           "host_port":31641,
>>>>           "container_port":8000,
>>>>           "protocol":"tcp"
>>>>         }
>>>>       ],
>>>>       "privileged":false,
>>>>       "force_pull_image":false
>>>>     }
>>>>   }
>>>> }
>>>
>>>
>>> No, I don't see anything about any health check.
>>>
>>> Mesos STDOUT for the launched task:
>>>
>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --stop_timeout="0ns"
>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>> --logbufsecs="0" --logging_level="INFO"
>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>> --stop_timeout="0ns"
>>>> Registered docker executor on mesos-worker1a
>>>> Starting task
>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>
>>>
>>> And STDERR:
>>>
>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>>>> 20150924-210922-1608624320-5050-1792-S1
>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>> limited without swap.
>>>
>>>
>>> Again, nothing about any health checks.
>>>
>>> Any ideas of other things to try or what I could be missing?  Can't say
>>> either way about the Mesos health-check system working or not if Marathon
>>> won't put the health-check into the task it sends to Mesos.
>>>
>>> Thanks for all your help!
>>>
>>> Best,
>>> Jay
>>>
>>>
>>>
>>>>
>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> wrote:
>>>
>>>> Maybe you could post your executor stdout/stderr so that we could know
>>>> whether health check running not.
>>>>
>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> wrote:
>>>>
>>>>> marathon also use mesos health check. When I use health check, I could
>>>>> saw the log like this in executor stdout.
>>>>>
>>>>> ```
>>>>> Registered docker executor on xxxxx
>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>> Launching health check process:
>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>> Health check process launched at pid: 9895
>>>>> Received task health update, healthy: true
>>>>> ```
>>>>>
>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>> Mesos's health checks for its health check system?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote:
>>>>>>
>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>> others confident health-checks are part of the code path when defined 
>>>>>>> via
>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>> able to find the linkage for anything other than health-checks triggered
>>>>>>> through a custom executor.
>>>>>>>
>>>>>>> With that being said it is a pretty good sized code base and I'm not
>>>>>>> very familiar with it, so my analysis this far has by no means been
>>>>>>> exhaustive.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote:
>>>>>>>
>>>>>>> When health check launch, it would have a log like this in your
>>>>>>> executor stdout
>>>>>>> ```
>>>>>>> Health check process launched at pid xxx
>>>>>>> ```
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>>>>> with the string "health" or "Health" if the health-check were active?  
>>>>>>>> None
>>>>>>>> of my master or slave logs contain the string..
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>> double check.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>> there :)
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>>>>> master but not yet released. It will run docker exec with the 
>>>>>>>>>>>>>> command you
>>>>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not 
>>>>>>>>>>>>>> found any
>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  
>>>>>>>>>>>>>> Does this
>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors 
>>>>>>>>>>>>>> and not for
>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Best Regards,
>>>>>>>>>> Haosdent Huang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>


-- 
Best Regards,
Haosdent Huang

Reply via email to