As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir which would find mesos-docker-executor and mesos-health-check under this dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get from it.
For example, because I ``` export MESOS_LAUNCHER_DIR=/tmp ``` before start mesos-slave. So when I launch slave, I could find this log in slave log ``` I1009 10:27:26.594599 1416 slave.cpp:203] Flags at startup: xxxxx --launcher_dir="/tmp" ``` And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts? On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]> wrote: > I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before. > > I just tried setting both the env var and flag on the slaves, and have > determined that the env var is not present when it is being checked > src/docker/executor.cpp @ line 573: > > const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); >> string path = >> envPath.isSome() ? envPath.get() >> : os::realpath(Path(argv[0]).dirname()).get(); >> cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? >> "yes" : "no") << endl; >> cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl; > > > Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly > propagated along up to the point of mesos-slave launch): > > $ cat /etc/default/mesos-slave >> export >> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" >> export MESOS_CONTAINERIZERS="mesos,docker" >> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins" >> export MESOS_PORT="5050" >> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos" > > > TASK OUTPUT: > > >> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR: >> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'* >> Registered docker executor on mesos-worker2a >> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >> Launching health check process: >> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check >> --executor=(1)@192.168.225.59:44523 >> --health_check_json={"command":{"shell":true,"value":"docker exec >> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad >> sh -c \" \/bin\/bash >> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >> Health check process launched at pid: 2519 > > > The env var is not propagated when the docker executor is launched > in src/slave/containerizer/docker.cpp around line 903: > > vector<string> argv; >> argv.push_back("mesos-docker-executor"); >> // Construct the mesos-docker-executor using the "name" we gave the >> // container (to distinguish it from Docker containers not created >> // by Mesos). >> Try<Subprocess> s = subprocess( >> path::join(flags.launcher_dir, "mesos-docker-executor"), >> argv, >> Subprocess::PIPE(), >> Subprocess::PATH(path::join(container->directory, "stdout")), >> Subprocess::PATH(path::join(container->directory, "stderr")), >> dockerFlags(flags, container->name(), container->directory), >> environment, >> lambda::bind(&setup, container->directory)); > > > A little ways above we can see the environment is setup w/ the container > tasks defined env vars. > > See src/slave/containerizer/docker.cpp around line 871: > > // Include any enviroment variables from ExecutorInfo. >> foreach (const Environment::Variable& variable, >> container->executor.command().environment().variables()) { >> environment[variable.name()] = variable.value(); >> } > > > Should I file a JIRA for this? Have I overlooked anything? > > > On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote: > >> >Not sure what was going on with health-checks in 0.24.0. >> 0.24.1 should be works. >> >> >Do any of you know which host the path >> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >> should exist on? It definitely doesn't exist on the slave, hence execution >> failing. >> >> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got >> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same >> dir of mesos-docker-executor. >> >> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> wrote: >> >>> Maybe I spoke too soon. >>> >>> Now the checks are attempting to run, however the STDERR is not looking >>> good. I've added some debugging to the error message output to show the >>> path, argv, and envp variables: >>> >>> STDOUT: >>> >>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>> --stop_timeout="0ns" >>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>> --stop_timeout="0ns" >>>> Registered docker executor on mesos-worker2a >>>> Starting task >>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>> Launching health check process: >>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >>>> --executor=(1)@192.168.225.59:43917 >>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >>>> sh -c \" exit 1 >>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>> Health check process launched at pid: 3012 >>> >>> >>> STDERR: >>> >>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave >>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >>>> WARNING: Your kernel does not support swap limit capabilities, memory >>>> limited without swap. >>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain >>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time) >>>> try "date -d @1444270649" if you are using GNU date *** >>>> PC: @ 0x7f4a37ec6cc9 (unknown) >>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID >>>> 3012; stack trace: *** >>>> @ 0x7f4a38265340 (unknown) >>>> @ 0x7f4a37ec6cc9 (unknown) >>>> @ 0x7f4a37eca0d8 (unknown) >>>> @ 0x4191e2 _Abort() >>>> @ 0x41921c _Abort() >>>> @ 0x7f4a39dc2768 process::childMain() >>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >>>> @ 0x7f4a39dc24fc process::defaultClone() >>>> @ 0x7f4a39dc34fb process::subprocess() >>>> @ 0x43cc9c >>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >>>> @ 0x7f4a39d924f4 process::ProcessManager::resume() >>>> @ 0x7f4a39d92827 >>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >>>> @ 0x7f4a38a47e40 (unknown) >>>> @ 0x7f4a3825d182 start_thread >>>> @ 0x7f4a37f8a47d (unknown) >>> >>> >>> Do any of you know which host the path >>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>> should exist on? It definitely doesn't exist on the slave, hence >>> execution failing. >>> >>> This is with current master, git hash >>> 5058fac1083dc91bca54d33c26c810c17ad95dd1. >>> >>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >>>> Author: Anand Mazumdar <[email protected]> >>>> Date: Tue Oct 6 17:37:41 2015 -0700 >>> >>> >>> -Jay >>> >>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> wrote: >>> >>>> Update: >>>> >>>> I used https://github.com/deric/mesos-deb-packaging to compile and >>>> package the latest master (0.26.x) and deployed it to the cluster, and now >>>> health checks are working as advertised in both Marathon and my own >>>> framework! Not sure what was going on with health-checks in 0.24.0.. >>>> >>>> Anyways, thanks again for your help Haosdent! >>>> >>>> Cheers, >>>> Jay >>>> >>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> >>>> wrote: >>>> >>>>> Hi Haosdent, >>>>> >>>>> Can you share your Marathon POST request that results in Mesos >>>>> executing the health checks? >>>>> >>>>> Since we can reference the Marathon framework, I've been doing some >>>>> digging around. >>>>> >>>>> Here are the details of my setup and findings: >>>>> >>>>> I put a few small hacks in Marathon: >>>>> >>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies >>>>> >>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X >>>>> in both the TaskFactory as well an right before the task is sent to Mesos >>>>> via driver.launchTasks: >>>>> >>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>>>> >>>>> $ git diff >>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>>>> >>>>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>>>> config).buildIfMatches(offer, runningTasks).map { >>>>>> case (taskInfo, ports) => >>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>> + import java.io._ >>>>>> + val bw = new BufferedWriter(new FileWriter(new >>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>>>> + bw.write("\n") >>>>>> + bw.close() >>>>>> CreatedTask( >>>>>> taskInfo, >>>>>> MarathonTasks.makeTask( >>>>> >>>>> >>>>> >>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>>>> >>>>> $ git diff >>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>>>> override def launchTasks(offerID: OfferID, taskInfos: >>>>>> Seq[TaskInfo]): Boolean = { >>>>>> val launched = withDriver(s"launchTasks($offerID)") { driver => >>>>>> import scala.collection.JavaConverters._ >>>>>> + var i = 0 >>>>>> + for (i <- 0 to taskInfos.length - 1) { >>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>> + import java.io._ >>>>>> + val file = new File("/tmp/taskJson2-" + i.toString() + "-" + >>>>>> taskInfos(i).getTaskId.getValue) >>>>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>>>> + bw.write("\n") >>>>>> + bw.close() >>>>>> + } >>>>>> driver.launchTasks(Collections.singleton(offerID), >>>>>> taskInfos.asJava) >>>>>> } >>>>> >>>>> >>>>> Then I built and deployed the hacked Marathon and restarted the >>>>> marathon service. >>>>> >>>>> Next I created the app via the Marathon API ("hello app" is a >>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000) >>>>> >>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: >>>>>> application/json' -d' >>>>>> { >>>>>> "id": "/app-81-1-hello-app", >>>>>> "apps": [ >>>>>> { >>>>>> "id": "/app-81-1-hello-app/web-v11", >>>>>> "container": { >>>>>> "type": "DOCKER", >>>>>> "docker": { >>>>>> "image": >>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>> "network": "BRIDGE", >>>>>> "portMappings": [ >>>>>> { >>>>>> "containerPort": 8000, >>>>>> "hostPort": 0, >>>>>> "protocol": "tcp" >>>>>> } >>>>>> ] >>>>>> } >>>>>> }, >>>>>> "env": { >>>>>> >>>>>> }, >>>>>> "healthChecks": [ >>>>>> { >>>>>> "protocol": "COMMAND", >>>>>> "command": {"value": "exit 1"}, >>>>>> "gracePeriodSeconds": 10, >>>>>> "intervalSeconds": 10, >>>>>> "timeoutSeconds": 10, >>>>>> "maxConsecutiveFailures": 3 >>>>>> } >>>>>> ], >>>>>> "instances": 1, >>>>>> "cpus": 1, >>>>>> "mem": 512 >>>>>> } >>>>>> ] >>>>>> } >>>>> >>>>> >>>>> $ ls /tmp/ >>>>>> >>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> >>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>> >>>>> >>>>> Do they match? >>>>> >>>>> $ md5sum /tmp/task* >>>>>> 1b5115997e78e2611654059249d99578 >>>>>> >>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> 1b5115997e78e2611654059249d99578 >>>>>> >>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>> >>>>> >>>>> Yes, so I am confident this is the information being sent across the >>>>> wire to Mesos. >>>>> >>>>> Do they contain any health-check information? >>>>> >>>>> $ cat >>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> { >>>>>> "name":"web-v11.app-81-1-hello-app", >>>>>> "task_id":{ >>>>>> >>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>> }, >>>>>> "slave_id":{ >>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>> }, >>>>>> "resources":[ >>>>>> { >>>>>> "name":"cpus", >>>>>> "type":"SCALAR", >>>>>> "scalar":{ >>>>>> "value":1.0 >>>>>> }, >>>>>> "role":"*" >>>>>> }, >>>>>> { >>>>>> "name":"mem", >>>>>> "type":"SCALAR", >>>>>> "scalar":{ >>>>>> "value":512.0 >>>>>> }, >>>>>> "role":"*" >>>>>> }, >>>>>> { >>>>>> "name":"ports", >>>>>> "type":"RANGES", >>>>>> "ranges":{ >>>>>> "range":[ >>>>>> { >>>>>> "begin":31641, >>>>>> "end":31641 >>>>>> } >>>>>> ] >>>>>> }, >>>>>> "role":"*" >>>>>> } >>>>>> ], >>>>>> "command":{ >>>>>> "environment":{ >>>>>> "variables":[ >>>>>> { >>>>>> "name":"PORT_8000", >>>>>> "value":"31641" >>>>>> }, >>>>>> { >>>>>> "name":"MARATHON_APP_VERSION", >>>>>> "value":"2015-10-07T19:35:08.386Z" >>>>>> }, >>>>>> { >>>>>> "name":"HOST", >>>>>> "value":"mesos-worker1a" >>>>>> }, >>>>>> { >>>>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>>>> >>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>>>> }, >>>>>> { >>>>>> "name":"MESOS_TASK_ID", >>>>>> >>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>> }, >>>>>> { >>>>>> "name":"PORT", >>>>>> "value":"31641" >>>>>> }, >>>>>> { >>>>>> "name":"PORTS", >>>>>> "value":"31641" >>>>>> }, >>>>>> { >>>>>> "name":"MARATHON_APP_ID", >>>>>> "value":"/app-81-1-hello-app/web-v11" >>>>>> }, >>>>>> { >>>>>> "name":"PORT0", >>>>>> "value":"31641" >>>>>> } >>>>>> ] >>>>>> }, >>>>>> "shell":false >>>>>> }, >>>>>> "container":{ >>>>>> "type":"DOCKER", >>>>>> "docker":{ >>>>>> >>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>> "network":"BRIDGE", >>>>>> "port_mappings":[ >>>>>> { >>>>>> "host_port":31641, >>>>>> "container_port":8000, >>>>>> "protocol":"tcp" >>>>>> } >>>>>> ], >>>>>> "privileged":false, >>>>>> "force_pull_image":false >>>>>> } >>>>>> } >>>>>> } >>>>> >>>>> >>>>> No, I don't see anything about any health check. >>>>> >>>>> Mesos STDOUT for the launched task: >>>>> >>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>> --stop_timeout="0ns" >>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>> --stop_timeout="0ns" >>>>>> Registered docker executor on mesos-worker1a >>>>>> Starting task >>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>> >>>>> >>>>> And STDERR: >>>>> >>>>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on >>>>>> slave 20150924-210922-1608624320-5050-1792-S1 >>>>>> WARNING: Your kernel does not support swap limit capabilities, memory >>>>>> limited without swap. >>>>> >>>>> >>>>> Again, nothing about any health checks. >>>>> >>>>> Any ideas of other things to try or what I could be missing? Can't >>>>> say either way about the Mesos health-check system working or not if >>>>> Marathon won't put the health-check into the task it sends to Mesos. >>>>> >>>>> Thanks for all your help! >>>>> >>>>> Best, >>>>> Jay >>>>> >>>>> >>>>> >>>>>> >>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> wrote: >>>>> >>>>>> Maybe you could post your executor stdout/stderr so that we could >>>>>> know whether health check running not. >>>>>> >>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> wrote: >>>>>> >>>>>>> marathon also use mesos health check. When I use health check, I >>>>>>> could saw the log like this in executor stdout. >>>>>>> >>>>>>> ``` >>>>>>> Registered docker executor on xxxxx >>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>>>> Launching health check process: >>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx >>>>>>> Health check process launched at pid: 9895 >>>>>>> Received task health update, healthy: true >>>>>>> ``` >>>>>>> >>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I am using my own framework, and the full task info I'm using is >>>>>>>> posted earlier in this thread. Do you happen to know if Marathon uses >>>>>>>> Mesos's health checks for its health check system? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote: >>>>>>>> >>>>>>>> Yes, launch the health task through its definition in taskinfo. Do >>>>>>>> you launch your task through Marathon? I could test it in my side. >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Precisely, and there are none of those statements. Are you or >>>>>>>>> others confident health-checks are part of the code path when defined >>>>>>>>> via >>>>>>>>> task info for docker container tasks? Going through the code, I >>>>>>>>> wasn't >>>>>>>>> able to find the linkage for anything other than health-checks >>>>>>>>> triggered >>>>>>>>> through a custom executor. >>>>>>>>> >>>>>>>>> With that being said it is a pretty good sized code base and I'm >>>>>>>>> not very familiar with it, so my analysis this far has by no means >>>>>>>>> been >>>>>>>>> exhaustive. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote: >>>>>>>>> >>>>>>>>> When health check launch, it would have a log like this in your >>>>>>>>> executor stdout >>>>>>>>> ``` >>>>>>>>> Health check process launched at pid xxx >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I'm happy to try this, however wouldn't there be output in the >>>>>>>>>> logs with the string "health" or "Health" if the health-check were >>>>>>>>>> active? >>>>>>>>>> None of my master or slave logs contain the string.. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could >>>>>>>>>> see unhealthy status in your task stdout/stderr. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> My current version is 0.24.1. >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>>>> Are you use one of this version? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me >>>>>>>>>>>>> double check. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll look >>>>>>>>>>>>>> there :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <[email protected] >>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it >>>>>>>>>>>>>>> out? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We just added health check support for docker tasks that's >>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with >>>>>>>>>>>>>>>> the command >>>>>>>>>>>>>>>> you provided as health checks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Tim >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <[email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks? >>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for >>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have searched all machines and containers to see if they >>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not >>>>>>>>>>>>>>>> found any >>>>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from >>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask. >>>>>>>>>>>>>>>> Does this >>>>>>>>>>>>>>>> mean that health-checks are only supported for custom >>>>>>>>>>>>>>>> executors and not for >>>>>>>>>>>>>>>> docker tasks? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero >>>>>>>>>>>>>>>> exit-status of a health-check command translate to task health. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Haosdent Huang >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, >>>>>>>>> Haosdent Huang >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Haosdent Huang >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > > -- Best Regards, Haosdent Huang

