I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.

I just tried setting both the env var and flag on the slaves, and have
determined that the env var is not present when it is being checked
src/docker/executor.cpp @ line 573:

 const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>   string path =
>     envPath.isSome() ? envPath.get()
>                      : os::realpath(Path(argv[0]).dirname()).get();
>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ?
> "yes" : "no") << endl;
>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;


Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
propagated along up to the point of mesos-slave launch):

$ cat /etc/default/mesos-slave
> export
> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
> export MESOS_CONTAINERIZERS="mesos,docker"
> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
> export MESOS_PORT="5050"
> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"


TASK OUTPUT:


> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
> Registered docker executor on mesos-worker2a
> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
> Launching health check process:
> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
> --executor=(1)@192.168.225.59:44523
> --health_check_json={"command":{"shell":true,"value":"docker exec
> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
> sh -c \" \/bin\/bash
> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
> Health check process launched at pid: 2519


The env var is not propagated when the docker executor is launched
in src/slave/containerizer/docker.cpp around line 903:

  vector<string> argv;
>   argv.push_back("mesos-docker-executor");
>   // Construct the mesos-docker-executor using the "name" we gave the
>   // container (to distinguish it from Docker containers not created
>   // by Mesos).
>   Try<Subprocess> s = subprocess(
>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>       argv,
>       Subprocess::PIPE(),
>       Subprocess::PATH(path::join(container->directory, "stdout")),
>       Subprocess::PATH(path::join(container->directory, "stderr")),
>       dockerFlags(flags, container->name(), container->directory),
>       environment,
>       lambda::bind(&setup, container->directory));


A little ways above we can see the environment is setup w/ the container
tasks defined env vars.

See src/slave/containerizer/docker.cpp around line 871:

  // Include any enviroment variables from ExecutorInfo.
>   foreach (const Environment::Variable& variable,
>            container->executor.command().environment().variables()) {
>     environment[variable.name()] = variable.value();
>   }


Should I file a JIRA for this?  Have I overlooked anything?


On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote:

> >Not sure what was going on with health-checks in 0.24.0.
> 0.24.1 should be works.
>
> >Do any of you know which host the path
> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
> should exist on? It definitely doesn't exist on the slave, hence execution
> failing.
>
> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got
> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same
> dir of mesos-docker-executor.
>
> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> wrote:
>
>> Maybe I spoke too soon.
>>
>> Now the checks are attempting to run, however the STDERR is not looking
>> good.  I've added some debugging to the error message output to show the
>> path, argv, and envp variables:
>>
>> STDOUT:
>>
>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --stop_timeout="0ns"
>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false"
>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker2a
>>> Starting task
>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>> Launching health check process:
>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>> --executor=(1)@192.168.225.59:43917
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>> sh -c \" exit 1
>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>> Health check process launched at pid: 3012
>>
>>
>> STDERR:
>>
>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave
>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>> limited without swap.
>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain
>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time)
>>> try "date -d @1444270649" if you are using GNU date ***
>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID
>>> 3012; stack trace: ***
>>> @ 0x7f4a38265340 (unknown)
>>> @ 0x7f4a37ec6cc9 (unknown)
>>> @ 0x7f4a37eca0d8 (unknown)
>>> @ 0x4191e2 _Abort()
>>> @ 0x41921c _Abort()
>>> @ 0x7f4a39dc2768 process::childMain()
>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>> @ 0x7f4a39dc24fc process::defaultClone()
>>> @ 0x7f4a39dc34fb process::subprocess()
>>> @ 0x43cc9c
>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>> @ 0x7f4a39d92827
>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>> @ 0x7f4a38a47e40 (unknown)
>>> @ 0x7f4a3825d182 start_thread
>>> @ 0x7f4a37f8a47d (unknown)
>>
>>
>> Do any of you know which host the path 
>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>> should exist on? It definitely doesn't exist on the slave, hence
>> execution failing.
>>
>> This is with current master, git hash
>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>
>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>> Author: Anand Mazumdar <[email protected]>
>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>
>>
>> -Jay
>>
>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> wrote:
>>
>>> Update:
>>>
>>> I used https://github.com/deric/mesos-deb-packaging to compile and
>>> package the latest master (0.26.x) and deployed it to the cluster, and now
>>> health checks are working as advertised in both Marathon and my own
>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>
>>> Anyways, thanks again for your help Haosdent!
>>>
>>> Cheers,
>>> Jay
>>>
>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> wrote:
>>>
>>>> Hi Haosdent,
>>>>
>>>> Can you share your Marathon POST request that results in Mesos
>>>> executing the health checks?
>>>>
>>>> Since we can reference the Marathon framework, I've been doing some
>>>> digging around.
>>>>
>>>> Here are the details of my setup and findings:
>>>>
>>>> I put a few small hacks in Marathon:
>>>>
>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>
>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X
>>>> in both the TaskFactory as well an right before the task is sent to Mesos
>>>> via driver.launchTasks:
>>>>
>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>
>>>> $ git diff
>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>
>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>        case (taskInfo, ports) =>
>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>> +        import java.io._
>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>> +        bw.write("\n")
>>>>> +        bw.close()
>>>>>          CreatedTask(
>>>>>            taskInfo,
>>>>>            MarathonTasks.makeTask(
>>>>
>>>>
>>>>
>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>
>>>> $ git diff
>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>> Seq[TaskInfo]): Boolean = {
>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>        import scala.collection.JavaConverters._
>>>>> +      var i = 0
>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>> +        import java.io._
>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>>>>> taskInfos(i).getTaskId.getValue)
>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>> +        bw.write("\n")
>>>>> +        bw.close()
>>>>> +      }
>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>> taskInfos.asJava)
>>>>>      }
>>>>
>>>>
>>>> Then I built and deployed the hacked Marathon and restarted the
>>>> marathon service.
>>>>
>>>> Next I created the app via the Marathon API ("hello app" is a container
>>>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>
>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>>>>> application/json' -d'
>>>>> {
>>>>>   "id": "/app-81-1-hello-app",
>>>>>   "apps": [
>>>>>     {
>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>       "container": {
>>>>>         "type": "DOCKER",
>>>>>         "docker": {
>>>>>           "image":
>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>           "network": "BRIDGE",
>>>>>           "portMappings": [
>>>>>             {
>>>>>               "containerPort": 8000,
>>>>>               "hostPort": 0,
>>>>>               "protocol": "tcp"
>>>>>             }
>>>>>           ]
>>>>>         }
>>>>>       },
>>>>>       "env": {
>>>>>
>>>>>       },
>>>>>       "healthChecks": [
>>>>>         {
>>>>>           "protocol": "COMMAND",
>>>>>           "command": {"value": "exit 1"},
>>>>>           "gracePeriodSeconds": 10,
>>>>>           "intervalSeconds": 10,
>>>>>           "timeoutSeconds": 10,
>>>>>           "maxConsecutiveFailures": 3
>>>>>         }
>>>>>       ],
>>>>>       "instances": 1,
>>>>>       "cpus": 1,
>>>>>       "mem": 512
>>>>>     }
>>>>>   ]
>>>>> }
>>>>
>>>>
>>>> $ ls /tmp/
>>>>>
>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>
>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> Do they match?
>>>>
>>>> $ md5sum /tmp/task*
>>>>> 1b5115997e78e2611654059249d99578
>>>>>  
>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>> 1b5115997e78e2611654059249d99578
>>>>>  
>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> Yes, so I am confident this is the information being sent across the
>>>> wire to Mesos.
>>>>
>>>> Do they contain any health-check information?
>>>>
>>>> $ cat
>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>> {
>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>   "task_id":{
>>>>>
>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>   },
>>>>>   "slave_id":{
>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>   },
>>>>>   "resources":[
>>>>>     {
>>>>>       "name":"cpus",
>>>>>       "type":"SCALAR",
>>>>>       "scalar":{
>>>>>         "value":1.0
>>>>>       },
>>>>>       "role":"*"
>>>>>     },
>>>>>     {
>>>>>       "name":"mem",
>>>>>       "type":"SCALAR",
>>>>>       "scalar":{
>>>>>         "value":512.0
>>>>>       },
>>>>>       "role":"*"
>>>>>     },
>>>>>     {
>>>>>       "name":"ports",
>>>>>       "type":"RANGES",
>>>>>       "ranges":{
>>>>>         "range":[
>>>>>           {
>>>>>             "begin":31641,
>>>>>             "end":31641
>>>>>           }
>>>>>         ]
>>>>>       },
>>>>>       "role":"*"
>>>>>     }
>>>>>   ],
>>>>>   "command":{
>>>>>     "environment":{
>>>>>       "variables":[
>>>>>         {
>>>>>           "name":"PORT_8000",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>         },
>>>>>         {
>>>>>           "name":"HOST",
>>>>>           "value":"mesos-worker1a"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>
>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>         },
>>>>>         {
>>>>>           "name":"MESOS_TASK_ID",
>>>>>
>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORT",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORTS",
>>>>>           "value":"31641"
>>>>>         },
>>>>>         {
>>>>>           "name":"MARATHON_APP_ID",
>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>         },
>>>>>         {
>>>>>           "name":"PORT0",
>>>>>           "value":"31641"
>>>>>         }
>>>>>       ]
>>>>>     },
>>>>>     "shell":false
>>>>>   },
>>>>>   "container":{
>>>>>     "type":"DOCKER",
>>>>>     "docker":{
>>>>>
>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>       "network":"BRIDGE",
>>>>>       "port_mappings":[
>>>>>         {
>>>>>           "host_port":31641,
>>>>>           "container_port":8000,
>>>>>           "protocol":"tcp"
>>>>>         }
>>>>>       ],
>>>>>       "privileged":false,
>>>>>       "force_pull_image":false
>>>>>     }
>>>>>   }
>>>>> }
>>>>
>>>>
>>>> No, I don't see anything about any health check.
>>>>
>>>> Mesos STDOUT for the launched task:
>>>>
>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --stop_timeout="0ns"
>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>> --stop_timeout="0ns"
>>>>> Registered docker executor on mesos-worker1a
>>>>> Starting task
>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>
>>>>
>>>> And STDERR:
>>>>
>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>>>>> 20150924-210922-1608624320-5050-1792-S1
>>>>> WARNING: Your kernel does not support swap limit capabilities, memory
>>>>> limited without swap.
>>>>
>>>>
>>>> Again, nothing about any health checks.
>>>>
>>>> Any ideas of other things to try or what I could be missing?  Can't say
>>>> either way about the Mesos health-check system working or not if Marathon
>>>> won't put the health-check into the task it sends to Mesos.
>>>>
>>>> Thanks for all your help!
>>>>
>>>> Best,
>>>> Jay
>>>>
>>>>
>>>>
>>>>>
>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> wrote:
>>>>
>>>>> Maybe you could post your executor stdout/stderr so that we could know
>>>>> whether health check running not.
>>>>>
>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> wrote:
>>>>>
>>>>>> marathon also use mesos health check. When I use health check, I
>>>>>> could saw the log like this in executor stdout.
>>>>>>
>>>>>> ```
>>>>>> Registered docker executor on xxxxx
>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>> Launching health check process:
>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>>>>> Health check process launched at pid: 9895
>>>>>> Received task health update, healthy: true
>>>>>> ```
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I am using my own framework, and the full task info I'm using is
>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon uses
>>>>>>> Mesos's health checks for its health check system?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote:
>>>>>>>
>>>>>>> Yes, launch the health task through its definition in taskinfo. Do
>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Precisely, and there are none of those statements.  Are you or
>>>>>>>> others confident health-checks are part of the code path when defined 
>>>>>>>> via
>>>>>>>> task info for docker container tasks?  Going through the code, I wasn't
>>>>>>>> able to find the linkage for anything other than health-checks 
>>>>>>>> triggered
>>>>>>>> through a custom executor.
>>>>>>>>
>>>>>>>> With that being said it is a pretty good sized code base and I'm
>>>>>>>> not very familiar with it, so my analysis this far has by no means been
>>>>>>>> exhaustive.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote:
>>>>>>>>
>>>>>>>> When health check launch, it would have a log like this in your
>>>>>>>> executor stdout
>>>>>>>> ```
>>>>>>>> Health check process launched at pid xxx
>>>>>>>> ```
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm happy to try this, however wouldn't there be output in the
>>>>>>>>> logs with the string "health" or "Health" if the health-check were 
>>>>>>>>> active?
>>>>>>>>> None of my master or slave logs contain the string..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could
>>>>>>>>> see unhealthy status in your task stdout/stderr.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me
>>>>>>>>>>>> double check.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look
>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it
>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We just added health check support for docker tasks that's
>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with 
>>>>>>>>>>>>>>> the command
>>>>>>>>>>>>>>> you provided as health checks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?
>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for 
>>>>>>>>>>>>>>> me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have searched all machines and containers to see if they
>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not 
>>>>>>>>>>>>>>> found any
>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  
>>>>>>>>>>>>>>> Does this
>>>>>>>>>>>>>>> mean that health-checks are only supported for custom executors 
>>>>>>>>>>>>>>> and not for
>>>>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Reply via email to