Are those the stdout logs of the Agent? Because I don't see the --launcher-dir set, however, if I look into one that is running off the same 0.24.1 package, this is what I see:
I1012 14:56:36.933856 1704 slave.cpp:191] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="docker,mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --initialize_driver_logging="true" --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem" --launcher_dir="/usr/libexec/mesos" --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs" --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]" --revocable_cpu_low_priority="true" --sandbox_directory="/var/local/sandbox" --strict="true" --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent" (this is run off the Vagrantfile at [0] in case you want to reproduce). That agent is not run via the init command, though, I execute it manually via the `run-agent.sh` in the same directory. I don't really think this matters, but I assume you also restarted the agent after making the config changes? (and, for your own sanity - you can double check the version by looking at the very head of the logs). -- *Marco Massenzio* Distributed Systems Engineer http://codetrips.com On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <[email protected]> wrote: > Hi Haosdent and Mesos friends, > > I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the > mesosphere apt repo: > > $ dpkg -l | grep mesos > ii mesos 0.24.1-0.2.35.ubuntu1404 > amd64 Cluster resource manager with efficient resource isolation > > Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on the > slaves: > > mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir > /usr/libexec/mesos > > And yet the task health-checks are still being launched from the sandbox > directory like before! > > I've also tested setting the MESOS_LAUNCHER_DIR env var and get the > identical result (just as before on the cluster where many versions of > mesos had been installed): > > STDOUT: > > --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --docker="docker" --help="false" --initialize_driver_logging="true" >> --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --stop_timeout="0ns" >> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --docker="docker" --help="false" --initialize_driver_logging="true" >> --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --stop_timeout="0ns" >> Registered docker executor on mesos-worker1a >> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >> Launching health check process: >> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check >> --executor=(1)@192.168.225.58:48912 >> --health_check_json={"command":{"shell":true,"value":"docker exec >> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb >> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET >> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/ >> 127.0.0.1:8000 >> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} >> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >> Health check process launched at pid: 11253 > > > > STDERR: > > --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --docker="docker" --help="false" --initialize_driver_logging="true" >> --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --stop_timeout="0ns" >> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --docker="docker" --help="false" --initialize_driver_logging="true" >> --logbufsecs="0" --logging_level="INFO" >> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >> --stop_timeout="0ns" >> Registered docker executor on mesos-worker1a >> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >> *Launching health check process: >> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check* >> --executor=(1)@192.168.225.58:48912 >> --health_check_json={"command":{"shell":true,"value":"docker exec >> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb >> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET >> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/ >> 127.0.0.1:8000 >> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} >> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >> Health check process launched at pid: 11253 > > > Any ideas on where to go from here? Is there any additional information I > can provide? > > Thanks as always, > Jay > > > On Thu, Oct 8, 2015 at 9:23 PM, haosdent <[email protected]> wrote: > >> For flag sent to the executor from containerizer, the flag would >> stringify and become a command line parameter when launch executor. >> >> You could see this in >> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288 >> >> But for launcher_dir, the executor get it from `argv[0]`, as you >> mentioned above. >> ``` >> string path = >> envPath.isSome() ? envPath.get() >> : os::realpath(Path(argv[0]).dirname()).get(); >> >> ``` >> So I want to figure out why your argv[0] would become sandbox dir, not >> "/usr/libexec/mesos". >> >> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <[email protected]> wrote: >> >>> I see. And then how are the flags sent to the executor? >>> >>> >>> >>> On Oct 8, 2015, at 8:56 PM, haosdent <[email protected]> wrote: >>> >>> Yes. The related code is located in >>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123 >>> >>> In fact, environment variables starts with MESOS_ would load as flags >>> variables. >>> >>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52 >>> >>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <[email protected]> wrote: >>> >>>> One question for you haosdent- >>>> >>>> You mentioned that the flags.launcher_dir should propagate to the >>>> docker executor all the way up the chain. Can you show me where this logic >>>> is in the codebase? I didn't see where that was happening and would like >>>> to understand the mechanism. >>>> >>>> Thanks! >>>> Jay >>>> >>>> >>>> >>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <[email protected]> wrote: >>>> >>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the >>>> broken behavior experienced today still persists. >>>> >>>> On Oct 8, 2015, at 7:52 PM, haosdent <[email protected]> wrote: >>>> >>>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir >>>> which would find mesos-docker-executor and mesos-health-check under this >>>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still >>>> works because flags.launcher_dir is get from it. >>>> >>>> For example, because I >>>> ``` >>>> export MESOS_LAUNCHER_DIR=/tmp >>>> ``` >>>> before start mesos-slave. So when I launch slave, I could find this log >>>> in slave log >>>> ``` >>>> I1009 10:27:26.594599 1416 slave.cpp:203] Flags at startup: >>>> xxxxx --launcher_dir="/tmp" >>>> ``` >>>> >>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become >>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other >>>> scripts? >>>> >>>> >>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]> wrote: >>>> >>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before. >>>>> >>>>> I just tried setting both the env var and flag on the slaves, and have >>>>> determined that the env var is not present when it is being checked >>>>> src/docker/executor.cpp @ line 573: >>>>> >>>>> const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); >>>>>> string path = >>>>>> envPath.isSome() ? envPath.get() >>>>>> : os::realpath(Path(argv[0]).dirname()).get(); >>>>>> cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << >>>>>> (envPath.isSome() ? "yes" : "no") << endl; >>>>>> cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl; >>>>> >>>>> >>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly >>>>> propagated along up to the point of mesos-slave launch): >>>>> >>>>> $ cat /etc/default/mesos-slave >>>>>> export >>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" >>>>>> export MESOS_CONTAINERIZERS="mesos,docker" >>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins" >>>>>> export MESOS_PORT="5050" >>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos" >>>>> >>>>> >>>>> TASK OUTPUT: >>>>> >>>>> >>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR: >>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'* >>>>>> Registered docker executor on mesos-worker2a >>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>>>> Launching health check process: >>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check >>>>>> --executor=(1)@192.168.225.59:44523 >>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad >>>>>> sh -c \" \/bin\/bash >>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>>>> Health check process launched at pid: 2519 >>>>> >>>>> >>>>> The env var is not propagated when the docker executor is launched >>>>> in src/slave/containerizer/docker.cpp around line 903: >>>>> >>>>> vector<string> argv; >>>>>> argv.push_back("mesos-docker-executor"); >>>>>> // Construct the mesos-docker-executor using the "name" we gave the >>>>>> // container (to distinguish it from Docker containers not created >>>>>> // by Mesos). >>>>>> Try<Subprocess> s = subprocess( >>>>>> path::join(flags.launcher_dir, "mesos-docker-executor"), >>>>>> argv, >>>>>> Subprocess::PIPE(), >>>>>> Subprocess::PATH(path::join(container->directory, "stdout")), >>>>>> Subprocess::PATH(path::join(container->directory, "stderr")), >>>>>> dockerFlags(flags, container->name(), container->directory), >>>>>> environment, >>>>>> lambda::bind(&setup, container->directory)); >>>>> >>>>> >>>>> A little ways above we can see the environment is setup w/ the >>>>> container tasks defined env vars. >>>>> >>>>> See src/slave/containerizer/docker.cpp around line 871: >>>>> >>>>> // Include any enviroment variables from ExecutorInfo. >>>>>> foreach (const Environment::Variable& variable, >>>>>> container->executor.command().environment().variables()) { >>>>>> environment[variable.name()] = variable.value(); >>>>>> } >>>>> >>>>> >>>>> Should I file a JIRA for this? Have I overlooked anything? >>>>> >>>>> >>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote: >>>>> >>>>>> >Not sure what was going on with health-checks in 0.24.0. >>>>>> 0.24.1 should be works. >>>>>> >>>>>> >Do any of you know which host the path >>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>>>> should exist on? It definitely doesn't exist on the slave, hence >>>>>> execution >>>>>> failing. >>>>>> >>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We >>>>>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the >>>>>> same dir of mesos-docker-executor. >>>>>> >>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Maybe I spoke too soon. >>>>>>> >>>>>>> Now the checks are attempting to run, however the STDERR is not >>>>>>> looking good. I've added some debugging to the error message output to >>>>>>> show the path, argv, and envp variables: >>>>>>> >>>>>>> STDOUT: >>>>>>> >>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>>>> --logging_level="INFO" >>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>> --stop_timeout="0ns" >>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>>>> --logging_level="INFO" >>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>> --stop_timeout="0ns" >>>>>>>> Registered docker executor on mesos-worker2a >>>>>>>> Starting task >>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>>>> Launching health check process: >>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >>>>>>>> --executor=(1)@192.168.225.59:43917 >>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >>>>>>>> sh -c \" exit 1 >>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>>>> Health check process launched at pid: 3012 >>>>>>> >>>>>>> >>>>>>> STDERR: >>>>>>> >>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on >>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >>>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>>> memory limited without swap. >>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain >>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix >>>>>>>> time) >>>>>>>> try "date -d @1444270649" if you are using GNU date *** >>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown) >>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from >>>>>>>> PID 3012; stack trace: *** >>>>>>>> @ 0x7f4a38265340 (unknown) >>>>>>>> @ 0x7f4a37ec6cc9 (unknown) >>>>>>>> @ 0x7f4a37eca0d8 (unknown) >>>>>>>> @ 0x4191e2 _Abort() >>>>>>>> @ 0x41921c _Abort() >>>>>>>> @ 0x7f4a39dc2768 process::childMain() >>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >>>>>>>> @ 0x7f4a39dc24fc process::defaultClone() >>>>>>>> @ 0x7f4a39dc34fb process::subprocess() >>>>>>>> @ 0x43cc9c >>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume() >>>>>>>> @ 0x7f4a39d92827 >>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >>>>>>>> @ 0x7f4a38a47e40 (unknown) >>>>>>>> @ 0x7f4a3825d182 start_thread >>>>>>>> @ 0x7f4a37f8a47d (unknown) >>>>>>> >>>>>>> >>>>>>> Do any of you know which host the path >>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>>>>> should exist on? It definitely doesn't exist on the slave, hence >>>>>>> execution failing. >>>>>>> >>>>>>> This is with current master, git hash >>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1. >>>>>>> >>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >>>>>>>> Author: Anand Mazumdar <[email protected]> >>>>>>>> Date: Tue Oct 6 17:37:41 2015 -0700 >>>>>>> >>>>>>> >>>>>>> -Jay >>>>>>> >>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Update: >>>>>>>> >>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and >>>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and >>>>>>>> now >>>>>>>> health checks are working as advertised in both Marathon and my own >>>>>>>> framework! Not sure what was going on with health-checks in 0.24.0.. >>>>>>>> >>>>>>>> Anyways, thanks again for your help Haosdent! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Jay >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Haosdent, >>>>>>>>> >>>>>>>>> Can you share your Marathon POST request that results in Mesos >>>>>>>>> executing the health checks? >>>>>>>>> >>>>>>>>> Since we can reference the Marathon framework, I've been doing >>>>>>>>> some digging around. >>>>>>>>> >>>>>>>>> Here are the details of my setup and findings: >>>>>>>>> >>>>>>>>> I put a few small hacks in Marathon: >>>>>>>>> >>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies >>>>>>>>> >>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to >>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is >>>>>>>>> sent to >>>>>>>>> Mesos via driver.launchTasks: >>>>>>>>> >>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>>>>>>>> >>>>>>>>> $ git diff >>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>>>>>>>> >>>>>>>>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>>>>>>>> config).buildIfMatches(offer, runningTasks).map { >>>>>>>>>> case (taskInfo, ports) => >>>>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>>>> + import java.io._ >>>>>>>>>> + val bw = new BufferedWriter(new FileWriter(new >>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>>>>>>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>>>>>>>> + bw.write("\n") >>>>>>>>>> + bw.close() >>>>>>>>>> CreatedTask( >>>>>>>>>> taskInfo, >>>>>>>>>> MarathonTasks.makeTask( >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>>>>>>>> >>>>>>>>> $ git diff >>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>>>>>>>> override def launchTasks(offerID: OfferID, taskInfos: >>>>>>>>>> Seq[TaskInfo]): Boolean = { >>>>>>>>>> val launched = withDriver(s"launchTasks($offerID)") { driver >>>>>>>>>> => >>>>>>>>>> import scala.collection.JavaConverters._ >>>>>>>>>> + var i = 0 >>>>>>>>>> + for (i <- 0 to taskInfos.length - 1) { >>>>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>>>> + import java.io._ >>>>>>>>>> + val file = new File("/tmp/taskJson2-" + i.toString() + >>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue) >>>>>>>>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>>>>>>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>>>>>>>> + bw.write("\n") >>>>>>>>>> + bw.close() >>>>>>>>>> + } >>>>>>>>>> driver.launchTasks(Collections.singleton(offerID), >>>>>>>>>> taskInfos.asJava) >>>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> Then I built and deployed the hacked Marathon and restarted the >>>>>>>>> marathon service. >>>>>>>>> >>>>>>>>> Next I created the app via the Marathon API ("hello app" is a >>>>>>>>> container with a simple hello-world ruby app running on >>>>>>>>> 0.0.0.0:8000) >>>>>>>>> >>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST >>>>>>>>>> -H'Content-Type: application/json' -d' >>>>>>>>>> { >>>>>>>>>> "id": "/app-81-1-hello-app", >>>>>>>>>> "apps": [ >>>>>>>>>> { >>>>>>>>>> "id": "/app-81-1-hello-app/web-v11", >>>>>>>>>> "container": { >>>>>>>>>> "type": "DOCKER", >>>>>>>>>> "docker": { >>>>>>>>>> "image": >>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>>>> "network": "BRIDGE", >>>>>>>>>> "portMappings": [ >>>>>>>>>> { >>>>>>>>>> "containerPort": 8000, >>>>>>>>>> "hostPort": 0, >>>>>>>>>> "protocol": "tcp" >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> } >>>>>>>>>> }, >>>>>>>>>> "env": { >>>>>>>>>> >>>>>>>>>> }, >>>>>>>>>> "healthChecks": [ >>>>>>>>>> { >>>>>>>>>> "protocol": "COMMAND", >>>>>>>>>> "command": {"value": "exit 1"}, >>>>>>>>>> "gracePeriodSeconds": 10, >>>>>>>>>> "intervalSeconds": 10, >>>>>>>>>> "timeoutSeconds": 10, >>>>>>>>>> "maxConsecutiveFailures": 3 >>>>>>>>>> } >>>>>>>>>> ], >>>>>>>>>> "instances": 1, >>>>>>>>>> "cpus": 1, >>>>>>>>>> "mem": 512 >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> $ ls /tmp/ >>>>>>>>>> >>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> >>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>> >>>>>>>>> >>>>>>>>> Do they match? >>>>>>>>> >>>>>>>>> $ md5sum /tmp/task* >>>>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>>>> >>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>>>> >>>>>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, so I am confident this is the information being sent across >>>>>>>>> the wire to Mesos. >>>>>>>>> >>>>>>>>> Do they contain any health-check information? >>>>>>>>> >>>>>>>>> $ cat >>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> { >>>>>>>>>> "name":"web-v11.app-81-1-hello-app", >>>>>>>>>> "task_id":{ >>>>>>>>>> >>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>>>> }, >>>>>>>>>> "slave_id":{ >>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>> }, >>>>>>>>>> "resources":[ >>>>>>>>>> { >>>>>>>>>> "name":"cpus", >>>>>>>>>> "type":"SCALAR", >>>>>>>>>> "scalar":{ >>>>>>>>>> "value":1.0 >>>>>>>>>> }, >>>>>>>>>> "role":"*" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"mem", >>>>>>>>>> "type":"SCALAR", >>>>>>>>>> "scalar":{ >>>>>>>>>> "value":512.0 >>>>>>>>>> }, >>>>>>>>>> "role":"*" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"ports", >>>>>>>>>> "type":"RANGES", >>>>>>>>>> "ranges":{ >>>>>>>>>> "range":[ >>>>>>>>>> { >>>>>>>>>> "begin":31641, >>>>>>>>>> "end":31641 >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> }, >>>>>>>>>> "role":"*" >>>>>>>>>> } >>>>>>>>>> ], >>>>>>>>>> "command":{ >>>>>>>>>> "environment":{ >>>>>>>>>> "variables":[ >>>>>>>>>> { >>>>>>>>>> "name":"PORT_8000", >>>>>>>>>> "value":"31641" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"MARATHON_APP_VERSION", >>>>>>>>>> "value":"2015-10-07T19:35:08.386Z" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"HOST", >>>>>>>>>> "value":"mesos-worker1a" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>>>>>>>> >>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"MESOS_TASK_ID", >>>>>>>>>> >>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"PORT", >>>>>>>>>> "value":"31641" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"PORTS", >>>>>>>>>> "value":"31641" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"MARATHON_APP_ID", >>>>>>>>>> "value":"/app-81-1-hello-app/web-v11" >>>>>>>>>> }, >>>>>>>>>> { >>>>>>>>>> "name":"PORT0", >>>>>>>>>> "value":"31641" >>>>>>>>>> } >>>>>>>>>> ] >>>>>>>>>> }, >>>>>>>>>> "shell":false >>>>>>>>>> }, >>>>>>>>>> "container":{ >>>>>>>>>> "type":"DOCKER", >>>>>>>>>> "docker":{ >>>>>>>>>> >>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>>>> "network":"BRIDGE", >>>>>>>>>> "port_mappings":[ >>>>>>>>>> { >>>>>>>>>> "host_port":31641, >>>>>>>>>> "container_port":8000, >>>>>>>>>> "protocol":"tcp" >>>>>>>>>> } >>>>>>>>>> ], >>>>>>>>>> "privileged":false, >>>>>>>>>> "force_pull_image":false >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> No, I don't see anything about any health check. >>>>>>>>> >>>>>>>>> Mesos STDOUT for the launched task: >>>>>>>>> >>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>> --stop_timeout="0ns" >>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>> --stop_timeout="0ns" >>>>>>>>>> Registered docker executor on mesos-worker1a >>>>>>>>>> Starting task >>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>> >>>>>>>>> >>>>>>>>> And STDERR: >>>>>>>>> >>>>>>>>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>>>>>>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on >>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1 >>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>>>>> memory limited without swap. >>>>>>>>> >>>>>>>>> >>>>>>>>> Again, nothing about any health checks. >>>>>>>>> >>>>>>>>> Any ideas of other things to try or what I could be missing? >>>>>>>>> Can't say either way about the Mesos health-check system working or >>>>>>>>> not if >>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos. >>>>>>>>> >>>>>>>>> Thanks for all your help! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jay >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Maybe you could post your executor stdout/stderr so that we could >>>>>>>>>> know whether health check running not. >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> marathon also use mesos health check. When I use health check, I >>>>>>>>>>> could saw the log like this in executor stdout. >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> Registered docker executor on xxxxx >>>>>>>>>>> Starting task >>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>>>>>>>> Launching health check process: >>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check >>>>>>>>>>> --executor=xxxx >>>>>>>>>>> Health check process launched at pid: 9895 >>>>>>>>>>> Received task health update, healthy: true >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected] >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> I am using my own framework, and the full task info I'm using >>>>>>>>>>>> is posted earlier in this thread. Do you happen to know if >>>>>>>>>>>> Marathon uses >>>>>>>>>>>> Mesos's health checks for its health check system? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. >>>>>>>>>>>> Do you launch your task through Marathon? I could test it in my >>>>>>>>>>>> side. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Precisely, and there are none of those statements. Are you or >>>>>>>>>>>>> others confident health-checks are part of the code path when >>>>>>>>>>>>> defined via >>>>>>>>>>>>> task info for docker container tasks? Going through the code, I >>>>>>>>>>>>> wasn't >>>>>>>>>>>>> able to find the linkage for anything other than health-checks >>>>>>>>>>>>> triggered >>>>>>>>>>>>> through a custom executor. >>>>>>>>>>>>> >>>>>>>>>>>>> With that being said it is a pretty good sized code base and >>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no >>>>>>>>>>>>> means been >>>>>>>>>>>>> exhaustive. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> When health check launch, it would have a log like this in >>>>>>>>>>>>> your executor stdout >>>>>>>>>>>>> ``` >>>>>>>>>>>>> Health check process launched at pid xxx >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in >>>>>>>>>>>>>> the logs with the string "health" or "Health" if the >>>>>>>>>>>>>> health-check were >>>>>>>>>>>>>> active? None of my master or slave logs contain the string.. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether >>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> My current version is 0.24.1. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected] >>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>>>>>>>> Are you use one of this version? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let >>>>>>>>>>>>>>>>> me double check. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll >>>>>>>>>>>>>>>>>> look there :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks again! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test >>>>>>>>>>>>>>>>>>> it out? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks >>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker >>>>>>>>>>>>>>>>>>>> exec with the >>>>>>>>>>>>>>>>>>>> command you provided as health checks. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Tim >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image >>>>>>>>>>>>>>>>>>>> tasks? Mesos seems to be ignoring the >>>>>>>>>>>>>>>>>>>> TaskInfo.HealthCheck field for me. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if >>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but >>>>>>>>>>>>>>>>>>>> have not found any >>>>>>>>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked >>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp >>>>>>>>>>>>>>>>>>>> CommandExecutorProcess::launchTask. Does >>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for custom >>>>>>>>>>>>>>>>>>>> executors and >>>>>>>>>>>>>>>>>>>> not for docker tasks? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the >>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command translate >>>>>>>>>>>>>>>>>>>> to task health. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> Haosdent Huang >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Haosdent Huang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Haosdent Huang >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >>> >> >> >> -- >> Best Regards, >> Haosdent Huang >> > >

