For flag sent to the executor from containerizer, the flag would stringify and become a command line parameter when launch executor.
You could see this in https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288 But for launcher_dir, the executor get it from `argv[0]`, as you mentioned above. ``` string path = envPath.isSome() ? envPath.get() : os::realpath(Path(argv[0]).dirname()).get(); ``` So I want to figure out why your argv[0] would become sandbox dir, not "/usr/libexec/mesos". On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <[email protected]> wrote: > I see. And then how are the flags sent to the executor? > > > > On Oct 8, 2015, at 8:56 PM, haosdent <[email protected]> wrote: > > Yes. The related code is located in > https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123 > > In fact, environment variables starts with MESOS_ would load as flags > variables. > > https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52 > > On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <[email protected]> wrote: > >> One question for you haosdent- >> >> You mentioned that the flags.launcher_dir should propagate to the docker >> executor all the way up the chain. Can you show me where this logic is in >> the codebase? I didn't see where that was happening and would like to >> understand the mechanism. >> >> Thanks! >> Jay >> >> >> >> On Oct 8, 2015, at 8:29 PM, Jay Taylor <[email protected]> wrote: >> >> Maybe tomorrow I will build a fresh cluster from scratch to see if the >> broken behavior experienced today still persists. >> >> On Oct 8, 2015, at 7:52 PM, haosdent <[email protected]> wrote: >> >> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir >> which would find mesos-docker-executor and mesos-health-check under this >> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still >> works because flags.launcher_dir is get from it. >> >> For example, because I >> ``` >> export MESOS_LAUNCHER_DIR=/tmp >> ``` >> before start mesos-slave. So when I launch slave, I could find this log >> in slave log >> ``` >> I1009 10:27:26.594599 1416 slave.cpp:203] Flags at startup: >> xxxxx --launcher_dir="/tmp" >> ``` >> >> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox >> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts? >> >> >> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]> wrote: >> >>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before. >>> >>> I just tried setting both the env var and flag on the slaves, and have >>> determined that the env var is not present when it is being checked >>> src/docker/executor.cpp @ line 573: >>> >>> const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); >>>> string path = >>>> envPath.isSome() ? envPath.get() >>>> : os::realpath(Path(argv[0]).dirname()).get(); >>>> cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() >>>> ? "yes" : "no") << endl; >>>> cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl; >>> >>> >>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly >>> propagated along up to the point of mesos-slave launch): >>> >>> $ cat /etc/default/mesos-slave >>>> export >>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" >>>> export MESOS_CONTAINERIZERS="mesos,docker" >>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins" >>>> export MESOS_PORT="5050" >>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos" >>> >>> >>> TASK OUTPUT: >>> >>> >>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR: >>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'* >>>> Registered docker executor on mesos-worker2a >>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>> Launching health check process: >>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check >>>> --executor=(1)@192.168.225.59:44523 >>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad >>>> sh -c \" \/bin\/bash >>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>> Health check process launched at pid: 2519 >>> >>> >>> The env var is not propagated when the docker executor is launched >>> in src/slave/containerizer/docker.cpp around line 903: >>> >>> vector<string> argv; >>>> argv.push_back("mesos-docker-executor"); >>>> // Construct the mesos-docker-executor using the "name" we gave the >>>> // container (to distinguish it from Docker containers not created >>>> // by Mesos). >>>> Try<Subprocess> s = subprocess( >>>> path::join(flags.launcher_dir, "mesos-docker-executor"), >>>> argv, >>>> Subprocess::PIPE(), >>>> Subprocess::PATH(path::join(container->directory, "stdout")), >>>> Subprocess::PATH(path::join(container->directory, "stderr")), >>>> dockerFlags(flags, container->name(), container->directory), >>>> environment, >>>> lambda::bind(&setup, container->directory)); >>> >>> >>> A little ways above we can see the environment is setup w/ the container >>> tasks defined env vars. >>> >>> See src/slave/containerizer/docker.cpp around line 871: >>> >>> // Include any enviroment variables from ExecutorInfo. >>>> foreach (const Environment::Variable& variable, >>>> container->executor.command().environment().variables()) { >>>> environment[variable.name()] = variable.value(); >>>> } >>> >>> >>> Should I file a JIRA for this? Have I overlooked anything? >>> >>> >>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote: >>> >>>> >Not sure what was going on with health-checks in 0.24.0. >>>> 0.24.1 should be works. >>>> >>>> >Do any of you know which host the path >>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>> should exist on? It definitely doesn't exist on the slave, hence execution >>>> failing. >>>> >>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We >>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the >>>> same dir of mesos-docker-executor. >>>> >>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> >>>> wrote: >>>> >>>>> Maybe I spoke too soon. >>>>> >>>>> Now the checks are attempting to run, however the STDERR is not >>>>> looking good. I've added some debugging to the error message output to >>>>> show the path, argv, and envp variables: >>>>> >>>>> STDOUT: >>>>> >>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>> --logging_level="INFO" >>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>> --stop_timeout="0ns" >>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>> --logging_level="INFO" >>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>> --stop_timeout="0ns" >>>>>> Registered docker executor on mesos-worker2a >>>>>> Starting task >>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>> Launching health check process: >>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >>>>>> --executor=(1)@192.168.225.59:43917 >>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >>>>>> sh -c \" exit 1 >>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>> Health check process launched at pid: 3012 >>>>> >>>>> >>>>> STDERR: >>>>> >>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave >>>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >>>>>> WARNING: Your kernel does not support swap limit capabilities, memory >>>>>> limited without swap. >>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain >>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time) >>>>>> try "date -d @1444270649" if you are using GNU date *** >>>>>> PC: @ 0x7f4a37ec6cc9 (unknown) >>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from >>>>>> PID 3012; stack trace: *** >>>>>> @ 0x7f4a38265340 (unknown) >>>>>> @ 0x7f4a37ec6cc9 (unknown) >>>>>> @ 0x7f4a37eca0d8 (unknown) >>>>>> @ 0x4191e2 _Abort() >>>>>> @ 0x41921c _Abort() >>>>>> @ 0x7f4a39dc2768 process::childMain() >>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >>>>>> @ 0x7f4a39dc24fc process::defaultClone() >>>>>> @ 0x7f4a39dc34fb process::subprocess() >>>>>> @ 0x43cc9c >>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume() >>>>>> @ 0x7f4a39d92827 >>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >>>>>> @ 0x7f4a38a47e40 (unknown) >>>>>> @ 0x7f4a3825d182 start_thread >>>>>> @ 0x7f4a37f8a47d (unknown) >>>>> >>>>> >>>>> Do any of you know which host the path >>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>>> should exist on? It definitely doesn't exist on the slave, hence >>>>> execution failing. >>>>> >>>>> This is with current master, git hash >>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1. >>>>> >>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >>>>>> Author: Anand Mazumdar <[email protected]> >>>>>> Date: Tue Oct 6 17:37:41 2015 -0700 >>>>> >>>>> >>>>> -Jay >>>>> >>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> >>>>> wrote: >>>>> >>>>>> Update: >>>>>> >>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and >>>>>> package the latest master (0.26.x) and deployed it to the cluster, and >>>>>> now >>>>>> health checks are working as advertised in both Marathon and my own >>>>>> framework! Not sure what was going on with health-checks in 0.24.0.. >>>>>> >>>>>> Anyways, thanks again for your help Haosdent! >>>>>> >>>>>> Cheers, >>>>>> Jay >>>>>> >>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Haosdent, >>>>>>> >>>>>>> Can you share your Marathon POST request that results in Mesos >>>>>>> executing the health checks? >>>>>>> >>>>>>> Since we can reference the Marathon framework, I've been doing some >>>>>>> digging around. >>>>>>> >>>>>>> Here are the details of my setup and findings: >>>>>>> >>>>>>> I put a few small hacks in Marathon: >>>>>>> >>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies >>>>>>> >>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to >>>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent >>>>>>> to >>>>>>> Mesos via driver.launchTasks: >>>>>>> >>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>>>>>> >>>>>>> $ git diff >>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>>>>>> >>>>>>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>>>>>> config).buildIfMatches(offer, runningTasks).map { >>>>>>>> case (taskInfo, ports) => >>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>> + import java.io._ >>>>>>>> + val bw = new BufferedWriter(new FileWriter(new >>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>>>>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>>>>>> + bw.write("\n") >>>>>>>> + bw.close() >>>>>>>> CreatedTask( >>>>>>>> taskInfo, >>>>>>>> MarathonTasks.makeTask( >>>>>>> >>>>>>> >>>>>>> >>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>>>>>> >>>>>>> $ git diff >>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>>>>>> override def launchTasks(offerID: OfferID, taskInfos: >>>>>>>> Seq[TaskInfo]): Boolean = { >>>>>>>> val launched = withDriver(s"launchTasks($offerID)") { driver => >>>>>>>> import scala.collection.JavaConverters._ >>>>>>>> + var i = 0 >>>>>>>> + for (i <- 0 to taskInfos.length - 1) { >>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>> + import java.io._ >>>>>>>> + val file = new File("/tmp/taskJson2-" + i.toString() + "-" >>>>>>>> + taskInfos(i).getTaskId.getValue) >>>>>>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>>>>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>>>>>> + bw.write("\n") >>>>>>>> + bw.close() >>>>>>>> + } >>>>>>>> driver.launchTasks(Collections.singleton(offerID), >>>>>>>> taskInfos.asJava) >>>>>>>> } >>>>>>> >>>>>>> >>>>>>> Then I built and deployed the hacked Marathon and restarted the >>>>>>> marathon service. >>>>>>> >>>>>>> Next I created the app via the Marathon API ("hello app" is a >>>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000 >>>>>>> ) >>>>>>> >>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: >>>>>>>> application/json' -d' >>>>>>>> { >>>>>>>> "id": "/app-81-1-hello-app", >>>>>>>> "apps": [ >>>>>>>> { >>>>>>>> "id": "/app-81-1-hello-app/web-v11", >>>>>>>> "container": { >>>>>>>> "type": "DOCKER", >>>>>>>> "docker": { >>>>>>>> "image": >>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>> "network": "BRIDGE", >>>>>>>> "portMappings": [ >>>>>>>> { >>>>>>>> "containerPort": 8000, >>>>>>>> "hostPort": 0, >>>>>>>> "protocol": "tcp" >>>>>>>> } >>>>>>>> ] >>>>>>>> } >>>>>>>> }, >>>>>>>> "env": { >>>>>>>> >>>>>>>> }, >>>>>>>> "healthChecks": [ >>>>>>>> { >>>>>>>> "protocol": "COMMAND", >>>>>>>> "command": {"value": "exit 1"}, >>>>>>>> "gracePeriodSeconds": 10, >>>>>>>> "intervalSeconds": 10, >>>>>>>> "timeoutSeconds": 10, >>>>>>>> "maxConsecutiveFailures": 3 >>>>>>>> } >>>>>>>> ], >>>>>>>> "instances": 1, >>>>>>>> "cpus": 1, >>>>>>>> "mem": 512 >>>>>>>> } >>>>>>>> ] >>>>>>>> } >>>>>>> >>>>>>> >>>>>>> $ ls /tmp/ >>>>>>>> >>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>> >>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> >>>>>>> >>>>>>> Do they match? >>>>>>> >>>>>>> $ md5sum /tmp/task* >>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>> >>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>> >>>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> >>>>>>> >>>>>>> Yes, so I am confident this is the information being sent across the >>>>>>> wire to Mesos. >>>>>>> >>>>>>> Do they contain any health-check information? >>>>>>> >>>>>>> $ cat >>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>> { >>>>>>>> "name":"web-v11.app-81-1-hello-app", >>>>>>>> "task_id":{ >>>>>>>> >>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>> }, >>>>>>>> "slave_id":{ >>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>> }, >>>>>>>> "resources":[ >>>>>>>> { >>>>>>>> "name":"cpus", >>>>>>>> "type":"SCALAR", >>>>>>>> "scalar":{ >>>>>>>> "value":1.0 >>>>>>>> }, >>>>>>>> "role":"*" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"mem", >>>>>>>> "type":"SCALAR", >>>>>>>> "scalar":{ >>>>>>>> "value":512.0 >>>>>>>> }, >>>>>>>> "role":"*" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"ports", >>>>>>>> "type":"RANGES", >>>>>>>> "ranges":{ >>>>>>>> "range":[ >>>>>>>> { >>>>>>>> "begin":31641, >>>>>>>> "end":31641 >>>>>>>> } >>>>>>>> ] >>>>>>>> }, >>>>>>>> "role":"*" >>>>>>>> } >>>>>>>> ], >>>>>>>> "command":{ >>>>>>>> "environment":{ >>>>>>>> "variables":[ >>>>>>>> { >>>>>>>> "name":"PORT_8000", >>>>>>>> "value":"31641" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"MARATHON_APP_VERSION", >>>>>>>> "value":"2015-10-07T19:35:08.386Z" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"HOST", >>>>>>>> "value":"mesos-worker1a" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>>>>>> >>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"MESOS_TASK_ID", >>>>>>>> >>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"PORT", >>>>>>>> "value":"31641" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"PORTS", >>>>>>>> "value":"31641" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"MARATHON_APP_ID", >>>>>>>> "value":"/app-81-1-hello-app/web-v11" >>>>>>>> }, >>>>>>>> { >>>>>>>> "name":"PORT0", >>>>>>>> "value":"31641" >>>>>>>> } >>>>>>>> ] >>>>>>>> }, >>>>>>>> "shell":false >>>>>>>> }, >>>>>>>> "container":{ >>>>>>>> "type":"DOCKER", >>>>>>>> "docker":{ >>>>>>>> >>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>> "network":"BRIDGE", >>>>>>>> "port_mappings":[ >>>>>>>> { >>>>>>>> "host_port":31641, >>>>>>>> "container_port":8000, >>>>>>>> "protocol":"tcp" >>>>>>>> } >>>>>>>> ], >>>>>>>> "privileged":false, >>>>>>>> "force_pull_image":false >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>> >>>>>>> >>>>>>> No, I don't see anything about any health check. >>>>>>> >>>>>>> Mesos STDOUT for the launched task: >>>>>>> >>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>> --stop_timeout="0ns" >>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>> --stop_timeout="0ns" >>>>>>>> Registered docker executor on mesos-worker1a >>>>>>>> Starting task >>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> >>>>>>> >>>>>>> And STDERR: >>>>>>> >>>>>>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>>>>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on >>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1 >>>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>>> memory limited without swap. >>>>>>> >>>>>>> >>>>>>> Again, nothing about any health checks. >>>>>>> >>>>>>> Any ideas of other things to try or what I could be missing? Can't >>>>>>> say either way about the Mesos health-check system working or not if >>>>>>> Marathon won't put the health-check into the task it sends to Mesos. >>>>>>> >>>>>>> Thanks for all your help! >>>>>>> >>>>>>> Best, >>>>>>> Jay >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Maybe you could post your executor stdout/stderr so that we could >>>>>>>> know whether health check running not. >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> marathon also use mesos health check. When I use health check, I >>>>>>>>> could saw the log like this in executor stdout. >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> Registered docker executor on xxxxx >>>>>>>>> Starting task >>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>>>>>> Launching health check process: >>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check >>>>>>>>> --executor=xxxx >>>>>>>>> Health check process launched at pid: 9895 >>>>>>>>> Received task health update, healthy: true >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am using my own framework, and the full task info I'm using is >>>>>>>>>> posted earlier in this thread. Do you happen to know if Marathon >>>>>>>>>> uses >>>>>>>>>> Mesos's health checks for its health check system? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Yes, launch the health task through its definition in taskinfo. >>>>>>>>>> Do you launch your task through Marathon? I could test it in my side. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Precisely, and there are none of those statements. Are you or >>>>>>>>>>> others confident health-checks are part of the code path when >>>>>>>>>>> defined via >>>>>>>>>>> task info for docker container tasks? Going through the code, I >>>>>>>>>>> wasn't >>>>>>>>>>> able to find the linkage for anything other than health-checks >>>>>>>>>>> triggered >>>>>>>>>>> through a custom executor. >>>>>>>>>>> >>>>>>>>>>> With that being said it is a pretty good sized code base and I'm >>>>>>>>>>> not very familiar with it, so my analysis this far has by no means >>>>>>>>>>> been >>>>>>>>>>> exhaustive. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> When health check launch, it would have a log like this in your >>>>>>>>>>> executor stdout >>>>>>>>>>> ``` >>>>>>>>>>> Health check process launched at pid xxx >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the >>>>>>>>>>>> logs with the string "health" or "Health" if the health-check were >>>>>>>>>>>> active? >>>>>>>>>>>> None of my master or slave logs contain the string.. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether >>>>>>>>>>>> could see unhealthy status in your task stdout/stderr. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> My current version is 0.24.1. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>>>>>> Are you use one of this version? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <[email protected] >>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me >>>>>>>>>>>>>>> double check. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll look >>>>>>>>>>>>>>>> there :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks again! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it >>>>>>>>>>>>>>>>> out? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We just added health check support for docker tasks >>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker >>>>>>>>>>>>>>>>>> exec with the >>>>>>>>>>>>>>>>>> command you provided as health checks. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Tim >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks? >>>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field >>>>>>>>>>>>>>>>>> for me. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if >>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but have >>>>>>>>>>>>>>>>>> not found any >>>>>>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from >>>>>>>>>>>>>>>>>> src/launcher/executor.cpp >>>>>>>>>>>>>>>>>> CommandExecutorProcess::launchTask. Does this >>>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom >>>>>>>>>>>>>>>>>> executors and not for >>>>>>>>>>>>>>>>>> docker tasks? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero >>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task >>>>>>>>>>>>>>>>>> health. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> Haosdent Huang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Haosdent Huang >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, >>>>>>>>> Haosdent Huang >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> >> > > > -- > Best Regards, > Haosdent Huang > > -- Best Regards, Haosdent Huang

