On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <[email protected]> wrote:
> Are those the stdout logs of the Agent? Because I don't see the > --launcher-dir set, however, if I look into one that is running off the > same 0.24.1 package, this is what I see: > > I1012 14:56:36.933856 1704 slave.cpp:191] Flags at startup: > --appc_store_dir="/tmp/mesos/store/appc" > --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5" > --cgroups_cpu_enable_pids_and_tids_count="false" > --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" > --cgroups_limit_swap="false" --cgroups_root="mesos" > --container_disk_watch_interval="15secs" --containerizers="docker,mesos" > --default_role="*" --disk_watch_interval="1mins" --docker="docker" > --docker_kill_orphans="true" --docker_remove_delay="6hrs" > --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" > --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" > --hadoop_home="" --help="false" --initialize_driver_logging="true" > --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem" > --launcher_dir="/usr/libexec/mesos" > --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0" > --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="1secs" > --resource_monitoring_interval="1secs" > --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]" > --revocable_cpu_low_priority="true" > --sandbox_directory="/var/local/sandbox" --strict="true" > --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent" > (this is run off the Vagrantfile at [0] in case you want to reproduce). > That agent is not run via the init command, though, I execute it manually > via the `run-agent.sh` in the same directory. > > I don't really think this matters, but I assume you also restarted the > agent after making the config changes? > (and, for your own sanity - you can double check the version by looking at > the very head of the logs). > > > [0] http://github.com/massenz/zk-mesos > > > > > -- > *Marco Massenzio* > Distributed Systems Engineer > http://codetrips.com > > On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <[email protected]> wrote: > >> Hi Haosdent and Mesos friends, >> >> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the >> mesosphere apt repo: >> >> $ dpkg -l | grep mesos >> ii mesos 0.24.1-0.2.35.ubuntu1404 >> amd64 Cluster resource manager with efficient resource isolation >> >> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on >> the slaves: >> >> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir >> /usr/libexec/mesos >> >> And yet the task health-checks are still being launched from the sandbox >> directory like before! >> >> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the >> identical result (just as before on the cluster where many versions of >> mesos had been installed): >> >> STDOUT: >> >> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --docker="docker" --help="false" --initialize_driver_logging="true" >>> --logbufsecs="0" --logging_level="INFO" >>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --stop_timeout="0ns" >>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --docker="docker" --help="false" --initialize_driver_logging="true" >>> --logbufsecs="0" --logging_level="INFO" >>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --stop_timeout="0ns" >>> Registered docker executor on mesos-worker1a >>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >>> Launching health check process: >>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check >>> --executor=(1)@192.168.225.58:48912 >>> --health_check_json={"command":{"shell":true,"value":"docker exec >>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb >>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET >>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/ >>> 127.0.0.1:8000 >>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} >>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >>> Health check process launched at pid: 11253 >> >> >> >> STDERR: >> >> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --docker="docker" --help="false" --initialize_driver_logging="true" >>> --logbufsecs="0" --logging_level="INFO" >>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --stop_timeout="0ns" >>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --docker="docker" --help="false" --initialize_driver_logging="true" >>> --logbufsecs="0" --logging_level="INFO" >>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb" >>> --stop_timeout="0ns" >>> Registered docker executor on mesos-worker1a >>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >>> *Launching health check process: >>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check* >>> --executor=(1)@192.168.225.58:48912 >>> --health_check_json={"command":{"shell":true,"value":"docker exec >>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb >>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET >>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/ >>> 127.0.0.1:8000 >>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1} >>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91 >>> Health check process launched at pid: 11253 >> >> >> Any ideas on where to go from here? Is there any additional information >> I can provide? >> >> Thanks as always, >> Jay >> >> >> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <[email protected]> wrote: >> >>> For flag sent to the executor from containerizer, the flag would >>> stringify and become a command line parameter when launch executor. >>> >>> You could see this in >>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288 >>> >>> But for launcher_dir, the executor get it from `argv[0]`, as you >>> mentioned above. >>> ``` >>> string path = >>> envPath.isSome() ? envPath.get() >>> : os::realpath(Path(argv[0]).dirname()).get(); >>> >>> ``` >>> So I want to figure out why your argv[0] would become sandbox dir, not >>> "/usr/libexec/mesos". >>> >>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <[email protected]> wrote: >>> >>>> I see. And then how are the flags sent to the executor? >>>> >>>> >>>> >>>> On Oct 8, 2015, at 8:56 PM, haosdent <[email protected]> wrote: >>>> >>>> Yes. The related code is located in >>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123 >>>> >>>> In fact, environment variables starts with MESOS_ would load as flags >>>> variables. >>>> >>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52 >>>> >>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <[email protected]> >>>> wrote: >>>> >>>>> One question for you haosdent- >>>>> >>>>> You mentioned that the flags.launcher_dir should propagate to the >>>>> docker executor all the way up the chain. Can you show me where this >>>>> logic >>>>> is in the codebase? I didn't see where that was happening and would like >>>>> to understand the mechanism. >>>>> >>>>> Thanks! >>>>> Jay >>>>> >>>>> >>>>> >>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <[email protected]> wrote: >>>>> >>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the >>>>> broken behavior experienced today still persists. >>>>> >>>>> On Oct 8, 2015, at 7:52 PM, haosdent <[email protected]> wrote: >>>>> >>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set >>>>> flags.launcher_dir which would find mesos-docker-executor >>>>> and mesos-health-check under this dir. Although the env is not propagated, >>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get >>>>> from it. >>>>> >>>>> For example, because I >>>>> ``` >>>>> export MESOS_LAUNCHER_DIR=/tmp >>>>> ``` >>>>> before start mesos-slave. So when I launch slave, I could find this >>>>> log in slave log >>>>> ``` >>>>> I1009 10:27:26.594599 1416 slave.cpp:203] Flags at startup: >>>>> xxxxx --launcher_dir="/tmp" >>>>> ``` >>>>> >>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become >>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other >>>>> scripts? >>>>> >>>>> >>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]> >>>>> wrote: >>>>> >>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before. >>>>>> >>>>>> I just tried setting both the env var and flag on the slaves, and >>>>>> have determined that the env var is not present when it is being checked >>>>>> src/docker/executor.cpp @ line 573: >>>>>> >>>>>> const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); >>>>>>> string path = >>>>>>> envPath.isSome() ? envPath.get() >>>>>>> : os::realpath(Path(argv[0]).dirname()).get(); >>>>>>> cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << >>>>>>> (envPath.isSome() ? "yes" : "no") << endl; >>>>>>> cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl; >>>>>> >>>>>> >>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly >>>>>> propagated along up to the point of mesos-slave launch): >>>>>> >>>>>> $ cat /etc/default/mesos-slave >>>>>>> export >>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" >>>>>>> export MESOS_CONTAINERIZERS="mesos,docker" >>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins" >>>>>>> export MESOS_PORT="5050" >>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos" >>>>>> >>>>>> >>>>>> TASK OUTPUT: >>>>>> >>>>>> >>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR: >>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'* >>>>>>> Registered docker executor on mesos-worker2a >>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>>>>> Launching health check process: >>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check >>>>>>> --executor=(1)@192.168.225.59:44523 >>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad >>>>>>> sh -c \" \/bin\/bash >>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>>>>>> Health check process launched at pid: 2519 >>>>>> >>>>>> >>>>>> The env var is not propagated when the docker executor is launched >>>>>> in src/slave/containerizer/docker.cpp around line 903: >>>>>> >>>>>> vector<string> argv; >>>>>>> argv.push_back("mesos-docker-executor"); >>>>>>> // Construct the mesos-docker-executor using the "name" we gave the >>>>>>> // container (to distinguish it from Docker containers not created >>>>>>> // by Mesos). >>>>>>> Try<Subprocess> s = subprocess( >>>>>>> path::join(flags.launcher_dir, "mesos-docker-executor"), >>>>>>> argv, >>>>>>> Subprocess::PIPE(), >>>>>>> Subprocess::PATH(path::join(container->directory, "stdout")), >>>>>>> Subprocess::PATH(path::join(container->directory, "stderr")), >>>>>>> dockerFlags(flags, container->name(), container->directory), >>>>>>> environment, >>>>>>> lambda::bind(&setup, container->directory)); >>>>>> >>>>>> >>>>>> A little ways above we can see the environment is setup w/ the >>>>>> container tasks defined env vars. >>>>>> >>>>>> See src/slave/containerizer/docker.cpp around line 871: >>>>>> >>>>>> // Include any enviroment variables from ExecutorInfo. >>>>>>> foreach (const Environment::Variable& variable, >>>>>>> container->executor.command().environment().variables()) { >>>>>>> environment[variable.name()] = variable.value(); >>>>>>> } >>>>>> >>>>>> >>>>>> Should I file a JIRA for this? Have I overlooked anything? >>>>>> >>>>>> >>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote: >>>>>> >>>>>>> >Not sure what was going on with health-checks in 0.24.0. >>>>>>> 0.24.1 should be works. >>>>>>> >>>>>>> >Do any of you know which host the path >>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>>>>> should exist on? It definitely doesn't exist on the slave, hence >>>>>>> execution >>>>>>> failing. >>>>>>> >>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? >>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use >>>>>>> the >>>>>>> same dir of mesos-docker-executor. >>>>>>> >>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Maybe I spoke too soon. >>>>>>>> >>>>>>>> Now the checks are attempting to run, however the STDERR is not >>>>>>>> looking good. I've added some debugging to the error message output to >>>>>>>> show the path, argv, and envp variables: >>>>>>>> >>>>>>>> STDOUT: >>>>>>>> >>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" >>>>>>>>> --help="false" >>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>>>>> --logging_level="INFO" >>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>>> --stop_timeout="0ns" >>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" >>>>>>>>> --help="false" >>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" >>>>>>>>> --logging_level="INFO" >>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>>>>>> --stop_timeout="0ns" >>>>>>>>> Registered docker executor on mesos-worker2a >>>>>>>>> Starting task >>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>>>>> Launching health check process: >>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >>>>>>>>> --executor=(1)@192.168.225.59:43917 >>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >>>>>>>>> sh -c \" exit 1 >>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>>>>>> Health check process launched at pid: 3012 >>>>>>>> >>>>>>>> >>>>>>>> STDERR: >>>>>>>> >>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on >>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>>>> memory limited without swap. >>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in >>>>>>>>> childMain >>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix >>>>>>>>> time) >>>>>>>>> try "date -d @1444270649" if you are using GNU date *** >>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown) >>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) >>>>>>>>> from PID 3012; stack trace: *** >>>>>>>>> @ 0x7f4a38265340 (unknown) >>>>>>>>> @ 0x7f4a37ec6cc9 (unknown) >>>>>>>>> @ 0x7f4a37eca0d8 (unknown) >>>>>>>>> @ 0x4191e2 _Abort() >>>>>>>>> @ 0x41921c _Abort() >>>>>>>>> @ 0x7f4a39dc2768 process::childMain() >>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone() >>>>>>>>> @ 0x7f4a39dc34fb process::subprocess() >>>>>>>>> @ 0x43cc9c >>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume() >>>>>>>>> @ 0x7f4a39d92827 >>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >>>>>>>>> @ 0x7f4a38a47e40 (unknown) >>>>>>>>> @ 0x7f4a3825d182 start_thread >>>>>>>>> @ 0x7f4a37f8a47d (unknown) >>>>>>>> >>>>>>>> >>>>>>>> Do any of you know which host the path >>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>>>>>> should exist on? It definitely doesn't exist on the slave, hence >>>>>>>> execution failing. >>>>>>>> >>>>>>>> This is with current master, git hash >>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1. >>>>>>>> >>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >>>>>>>>> Author: Anand Mazumdar <[email protected]> >>>>>>>>> Date: Tue Oct 6 17:37:41 2015 -0700 >>>>>>>> >>>>>>>> >>>>>>>> -Jay >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Update: >>>>>>>>> >>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile >>>>>>>>> and package the latest master (0.26.x) and deployed it to the >>>>>>>>> cluster, and >>>>>>>>> now health checks are working as advertised in both Marathon and my >>>>>>>>> own >>>>>>>>> framework! Not sure what was going on with health-checks in 0.24.0.. >>>>>>>>> >>>>>>>>> Anyways, thanks again for your help Haosdent! >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Jay >>>>>>>>> >>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Haosdent, >>>>>>>>>> >>>>>>>>>> Can you share your Marathon POST request that results in Mesos >>>>>>>>>> executing the health checks? >>>>>>>>>> >>>>>>>>>> Since we can reference the Marathon framework, I've been doing >>>>>>>>>> some digging around. >>>>>>>>>> >>>>>>>>>> Here are the details of my setup and findings: >>>>>>>>>> >>>>>>>>>> I put a few small hacks in Marathon: >>>>>>>>>> >>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's >>>>>>>>>> dependencies >>>>>>>>>> >>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to >>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is >>>>>>>>>> sent to >>>>>>>>>> Mesos via driver.launchTasks: >>>>>>>>>> >>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>>>>>>>>> >>>>>>>>>> $ git diff >>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>>>>>>>>> >>>>>>>>>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map { >>>>>>>>>>> case (taskInfo, ports) => >>>>>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>>>>> + import java.io._ >>>>>>>>>>> + val bw = new BufferedWriter(new FileWriter(new >>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>>>>>>>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>>>>>>>>> + bw.write("\n") >>>>>>>>>>> + bw.close() >>>>>>>>>>> CreatedTask( >>>>>>>>>>> taskInfo, >>>>>>>>>>> MarathonTasks.makeTask( >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>>>>>>>>> >>>>>>>>>> $ git diff >>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>>>>>>>>> override def launchTasks(offerID: OfferID, taskInfos: >>>>>>>>>>> Seq[TaskInfo]): Boolean = { >>>>>>>>>>> val launched = withDriver(s"launchTasks($offerID)") { >>>>>>>>>>> driver => >>>>>>>>>>> import scala.collection.JavaConverters._ >>>>>>>>>>> + var i = 0 >>>>>>>>>>> + for (i <- 0 to taskInfos.length - 1) { >>>>>>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>>>>>> + import java.io._ >>>>>>>>>>> + val file = new File("/tmp/taskJson2-" + i.toString() + >>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue) >>>>>>>>>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>>>>>>>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>>>>>>>>> + bw.write("\n") >>>>>>>>>>> + bw.close() >>>>>>>>>>> + } >>>>>>>>>>> driver.launchTasks(Collections.singleton(offerID), >>>>>>>>>>> taskInfos.asJava) >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the >>>>>>>>>> marathon service. >>>>>>>>>> >>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a >>>>>>>>>> container with a simple hello-world ruby app running on >>>>>>>>>> 0.0.0.0:8000) >>>>>>>>>> >>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST >>>>>>>>>>> -H'Content-Type: application/json' -d' >>>>>>>>>>> { >>>>>>>>>>> "id": "/app-81-1-hello-app", >>>>>>>>>>> "apps": [ >>>>>>>>>>> { >>>>>>>>>>> "id": "/app-81-1-hello-app/web-v11", >>>>>>>>>>> "container": { >>>>>>>>>>> "type": "DOCKER", >>>>>>>>>>> "docker": { >>>>>>>>>>> "image": >>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>>>>> "network": "BRIDGE", >>>>>>>>>>> "portMappings": [ >>>>>>>>>>> { >>>>>>>>>>> "containerPort": 8000, >>>>>>>>>>> "hostPort": 0, >>>>>>>>>>> "protocol": "tcp" >>>>>>>>>>> } >>>>>>>>>>> ] >>>>>>>>>>> } >>>>>>>>>>> }, >>>>>>>>>>> "env": { >>>>>>>>>>> >>>>>>>>>>> }, >>>>>>>>>>> "healthChecks": [ >>>>>>>>>>> { >>>>>>>>>>> "protocol": "COMMAND", >>>>>>>>>>> "command": {"value": "exit 1"}, >>>>>>>>>>> "gracePeriodSeconds": 10, >>>>>>>>>>> "intervalSeconds": 10, >>>>>>>>>>> "timeoutSeconds": 10, >>>>>>>>>>> "maxConsecutiveFailures": 3 >>>>>>>>>>> } >>>>>>>>>>> ], >>>>>>>>>>> "instances": 1, >>>>>>>>>>> "cpus": 1, >>>>>>>>>>> "mem": 512 >>>>>>>>>>> } >>>>>>>>>>> ] >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> $ ls /tmp/ >>>>>>>>>>> >>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>>> >>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Do they match? >>>>>>>>>> >>>>>>>>>> $ md5sum /tmp/task* >>>>>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>>>>> >>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>>>>>> >>>>>>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes, so I am confident this is the information being sent across >>>>>>>>>> the wire to Mesos. >>>>>>>>>> >>>>>>>>>> Do they contain any health-check information? >>>>>>>>>> >>>>>>>>>> $ cat >>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>>> { >>>>>>>>>>> "name":"web-v11.app-81-1-hello-app", >>>>>>>>>>> "task_id":{ >>>>>>>>>>> >>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>>>>> }, >>>>>>>>>>> "slave_id":{ >>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>> }, >>>>>>>>>>> "resources":[ >>>>>>>>>>> { >>>>>>>>>>> "name":"cpus", >>>>>>>>>>> "type":"SCALAR", >>>>>>>>>>> "scalar":{ >>>>>>>>>>> "value":1.0 >>>>>>>>>>> }, >>>>>>>>>>> "role":"*" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"mem", >>>>>>>>>>> "type":"SCALAR", >>>>>>>>>>> "scalar":{ >>>>>>>>>>> "value":512.0 >>>>>>>>>>> }, >>>>>>>>>>> "role":"*" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"ports", >>>>>>>>>>> "type":"RANGES", >>>>>>>>>>> "ranges":{ >>>>>>>>>>> "range":[ >>>>>>>>>>> { >>>>>>>>>>> "begin":31641, >>>>>>>>>>> "end":31641 >>>>>>>>>>> } >>>>>>>>>>> ] >>>>>>>>>>> }, >>>>>>>>>>> "role":"*" >>>>>>>>>>> } >>>>>>>>>>> ], >>>>>>>>>>> "command":{ >>>>>>>>>>> "environment":{ >>>>>>>>>>> "variables":[ >>>>>>>>>>> { >>>>>>>>>>> "name":"PORT_8000", >>>>>>>>>>> "value":"31641" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"MARATHON_APP_VERSION", >>>>>>>>>>> "value":"2015-10-07T19:35:08.386Z" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"HOST", >>>>>>>>>>> "value":"mesos-worker1a" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>>>>>>>>> >>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"MESOS_TASK_ID", >>>>>>>>>>> >>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"PORT", >>>>>>>>>>> "value":"31641" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"PORTS", >>>>>>>>>>> "value":"31641" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"MARATHON_APP_ID", >>>>>>>>>>> "value":"/app-81-1-hello-app/web-v11" >>>>>>>>>>> }, >>>>>>>>>>> { >>>>>>>>>>> "name":"PORT0", >>>>>>>>>>> "value":"31641" >>>>>>>>>>> } >>>>>>>>>>> ] >>>>>>>>>>> }, >>>>>>>>>>> "shell":false >>>>>>>>>>> }, >>>>>>>>>>> "container":{ >>>>>>>>>>> "type":"DOCKER", >>>>>>>>>>> "docker":{ >>>>>>>>>>> >>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>>>>>> "network":"BRIDGE", >>>>>>>>>>> "port_mappings":[ >>>>>>>>>>> { >>>>>>>>>>> "host_port":31641, >>>>>>>>>>> "container_port":8000, >>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>> } >>>>>>>>>>> ], >>>>>>>>>>> "privileged":false, >>>>>>>>>>> "force_pull_image":false >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> No, I don't see anything about any health check. >>>>>>>>>> >>>>>>>>>> Mesos STDOUT for the launched task: >>>>>>>>>> >>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>>> --stop_timeout="0ns" >>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>>>>>> --stop_timeout="0ns" >>>>>>>>>>> Registered docker executor on mesos-worker1a >>>>>>>>>>> Starting task >>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And STDERR: >>>>>>>>>> >>>>>>>>>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>>>>>>>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on >>>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1 >>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>>>>>> memory limited without swap. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Again, nothing about any health checks. >>>>>>>>>> >>>>>>>>>> Any ideas of other things to try or what I could be missing? >>>>>>>>>> Can't say either way about the Mesos health-check system working or >>>>>>>>>> not if >>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos. >>>>>>>>>> >>>>>>>>>> Thanks for all your help! >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jay >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we >>>>>>>>>>> could know whether health check running not. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> marathon also use mesos health check. When I use health check, >>>>>>>>>>>> I could saw the log like this in executor stdout. >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> Registered docker executor on xxxxx >>>>>>>>>>>> Starting task >>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>>>>>>>>> Launching health check process: >>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check >>>>>>>>>>>> --executor=xxxx >>>>>>>>>>>> Health check process launched at pid: 9895 >>>>>>>>>>>> Received task health update, healthy: true >>>>>>>>>>>> ``` >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am using my own framework, and the full task info I'm using >>>>>>>>>>>>> is posted earlier in this thread. Do you happen to know if >>>>>>>>>>>>> Marathon uses >>>>>>>>>>>>> Mesos's health checks for its health check system? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, launch the health task through its definition in >>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test >>>>>>>>>>>>> it in my >>>>>>>>>>>>> side. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Precisely, and there are none of those statements. Are you >>>>>>>>>>>>>> or others confident health-checks are part of the code path when >>>>>>>>>>>>>> defined >>>>>>>>>>>>>> via task info for docker container tasks? Going through the >>>>>>>>>>>>>> code, I wasn't >>>>>>>>>>>>>> able to find the linkage for anything other than health-checks >>>>>>>>>>>>>> triggered >>>>>>>>>>>>>> through a custom executor. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With that being said it is a pretty good sized code base and >>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no >>>>>>>>>>>>>> means been >>>>>>>>>>>>>> exhaustive. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> When health check launch, it would have a log like this in >>>>>>>>>>>>>> your executor stdout >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> Health check process launched at pid xxx >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in >>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the >>>>>>>>>>>>>>> health-check were >>>>>>>>>>>>>>> active? None of my master or slave logs contain the string.. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether >>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My current version is 0.24.1. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>>>>>>>>> Are you use one of this version? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let >>>>>>>>>>>>>>>>>> me double check. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll >>>>>>>>>>>>>>>>>>> look there :) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks again! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test >>>>>>>>>>>>>>>>>>>> it out? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks >>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker >>>>>>>>>>>>>>>>>>>>> exec with the >>>>>>>>>>>>>>>>>>>>> command you provided as health checks. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Tim >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image >>>>>>>>>>>>>>>>>>>>> tasks? Mesos seems to be ignoring the >>>>>>>>>>>>>>>>>>>>> TaskInfo.HealthCheck field for me. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if >>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but >>>>>>>>>>>>>>>>>>>>> have not found any >>>>>>>>>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked >>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp >>>>>>>>>>>>>>>>>>>>> CommandExecutorProcess::launchTask. Does >>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for >>>>>>>>>>>>>>>>>>>>> custom executors and >>>>>>>>>>>>>>>>>>>>> not for docker tasks? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the >>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command >>>>>>>>>>>>>>>>>>>>> translate to task health. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> Haosdent Huang >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Haosdent Huang >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Haosdent Huang >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Haosdent Huang >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> >

