On Mon, Oct 12, 2015 at 11:26 PM, Marco Massenzio <[email protected]>
wrote:

> Are those the stdout logs of the Agent? Because I don't see the
> --launcher-dir set, however, if I look into one that is running off the
> same 0.24.1 package, this is what I see:
>
> I1012 14:56:36.933856  1704 slave.cpp:191] Flags at startup:
> --appc_store_dir="/tmp/mesos/store/appc"
> --attributes="rack:r2d2;pod:demo,dev" --authenticatee="crammd5"
> --cgroups_cpu_enable_pids_and_tids_count="false"
> --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup"
> --cgroups_limit_swap="false" --cgroups_root="mesos"
> --container_disk_watch_interval="15secs" --containerizers="docker,mesos"
> --default_role="*" --disk_watch_interval="1mins" --docker="docker"
> --docker_kill_orphans="true" --docker_remove_delay="6hrs"
> --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns"
> --enforce_container_disk_quota="false"
> --executor_registration_timeout="1mins"
> --executor_shutdown_grace_period="5secs"
> --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB"
> --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
> --hadoop_home="" --help="false" --initialize_driver_logging="true"
> --ip="192.168.33.11" --isolation="cgroups/cpu,cgroups/mem"
> --launcher_dir="/usr/libexec/mesos"
> --log_dir="/var/local/mesos/logs/agent" --logbufsecs="0"
> --logging_level="INFO" --master="zk://192.168.33.1:2181/mesos/vagrant"
> --oversubscribed_resources_interval="15secs" --perf_duration="10secs"
> --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns"
> --quiet="false" --recover="reconnect" --recovery_timeout="15mins"
> --registration_backoff_factor="1secs"
> --resource_monitoring_interval="1secs"
> --resources="ports:[9000-10000];ephemeral_ports:[32768-57344]"
> --revocable_cpu_low_priority="true"
> --sandbox_directory="/var/local/sandbox" --strict="true"
> --switch_user="true" --version="false" --work_dir="/var/local/mesos/agent"
> (this is run off the Vagrantfile at [0] in case you want to reproduce).
> That agent is not run via the init command, though, I execute it manually
> via the `run-agent.sh` in the same directory.
>
> I don't really think this matters, but I assume you also restarted the
> agent after making the config changes?
> (and, for your own sanity - you can double check the version by looking at
> the very head of the logs).
>
>
> [0] http://github.com/massenz/zk-mesos

>
>
>
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Mon, Oct 12, 2015 at 10:50 PM, Jay Taylor <[email protected]> wrote:
>
>> Hi Haosdent and Mesos friends,
>>
>> I've rebuilt the cluster from scratch and installed mesos 0.24.1 from the
>> mesosphere apt repo:
>>
>> $ dpkg -l | grep mesos
>> ii  mesos                               0.24.1-0.2.35.ubuntu1404
>>    amd64        Cluster resource manager with efficient resource isolation
>>
>> Then added the `launcher_dir' flag to /etc/mesos-slave/launcher_dir on
>> the slaves:
>>
>> mesos-worker1a:~$ cat /etc/mesos-slave/launcher_dir
>> /usr/libexec/mesos
>>
>> And yet the task health-checks are still being launched from the sandbox
>> directory like before!
>>
>> I've also tested setting the MESOS_LAUNCHER_DIR env var and get the
>> identical result (just as before on the cluster where many versions of
>> mesos had been installed):
>>
>> STDOUT:
>>
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker1a
>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Launching health check process:
>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check
>>> --executor=(1)@192.168.225.58:48912
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>> 127.0.0.1:8000
>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Health check process launched at pid: 11253
>>
>>
>>
>> STDERR:
>>
>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> --container="mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>> --logbufsecs="0" --logging_level="INFO"
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>> --sandbox_directory="/tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb"
>>> --stop_timeout="0ns"
>>> Registered docker executor on mesos-worker1a
>>> Starting task hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> *Launching health check process:
>>> /tmp/mesos/slaves/20151012-184440-1625401536-5050-23953-S0/frameworks/20151012-184440-1625401536-5050-23953-0000/executors/hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91/runs/62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb/mesos-health-check*
>>> --executor=(1)@192.168.225.58:48912
>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>> mesos-20151012-184440-1625401536-5050-23953-S0.62d43b8f-6cd1-4c53-9ac8-84dbfc45bbcb
>>> sh -c \" curl --silent --show-error --fail --tcp-nodelay --head -X GET
>>> --user-agent flux-capacitor-health-checker --max-time 1 http:\/\/
>>> 127.0.0.1:8000
>>> \""},"consecutive_failures":6,"delay_seconds":15,"grace_period_seconds":10,"interval_seconds":1,"timeout_seconds":1}
>>> --task_id=hello-app_web-v3.33597b73-1943-41b4-a308-76132eebcc91
>>> Health check process launched at pid: 11253
>>
>>
>> Any ideas on where to go from here?  Is there any additional information
>> I can provide?
>>
>> Thanks as always,
>> Jay
>>
>>
>> On Thu, Oct 8, 2015 at 9:23 PM, haosdent <[email protected]> wrote:
>>
>>> For flag sent to the executor from containerizer, the flag would
>>> stringify and become a command line parameter when launch executor.
>>>
>>> You could see this in
>>> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L279-L288
>>>
>>> But for launcher_dir, the executor get it from `argv[0]`, as you
>>> mentioned above.
>>> ```
>>>   string path =
>>>     envPath.isSome() ? envPath.get()
>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>
>>> ```
>>> So I want to figure out why your argv[0] would become sandbox dir, not
>>> "/usr/libexec/mesos".
>>>
>>> On Fri, Oct 9, 2015 at 12:03 PM, Jay Taylor <[email protected]> wrote:
>>>
>>>> I see.  And then how are the flags sent to the executor?
>>>>
>>>>
>>>>
>>>> On Oct 8, 2015, at 8:56 PM, haosdent <[email protected]> wrote:
>>>>
>>>> Yes. The related code is located in
>>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
>>>>
>>>> In fact, environment variables starts with MESOS_ would load as flags
>>>> variables.
>>>>
>>>> https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52
>>>>
>>>> On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <[email protected]>
>>>> wrote:
>>>>
>>>>> One question for you haosdent-
>>>>>
>>>>> You mentioned that the flags.launcher_dir should propagate to the
>>>>> docker executor all the way up the chain.  Can you show me where this 
>>>>> logic
>>>>> is in the codebase?  I didn't see where that was happening and would like
>>>>> to understand the mechanism.
>>>>>
>>>>> Thanks!
>>>>> Jay
>>>>>
>>>>>
>>>>>
>>>>> On Oct 8, 2015, at 8:29 PM, Jay Taylor <[email protected]> wrote:
>>>>>
>>>>> Maybe tomorrow I will build a fresh cluster from scratch to see if the
>>>>> broken behavior experienced today still persists.
>>>>>
>>>>> On Oct 8, 2015, at 7:52 PM, haosdent <[email protected]> wrote:
>>>>>
>>>>> As far as I know, MESOS_LAUNCHER_DIR is works by set
>>>>> flags.launcher_dir which would find mesos-docker-executor
>>>>> and mesos-health-check under this dir. Although the env is not propagated,
>>>>> but MESOS_LAUNCHER_DIR still works because flags.launcher_dir is get
>>>>> from it.
>>>>>
>>>>> For example, because I
>>>>> ```
>>>>> export MESOS_LAUNCHER_DIR=/tmp
>>>>> ```
>>>>> before start mesos-slave. So when I launch slave, I could find this
>>>>> log in slave log
>>>>> ```
>>>>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup:
>>>>> xxxxx  --launcher_dir="/tmp"
>>>>> ```
>>>>>
>>>>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become
>>>>> sandbox dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other
>>>>> scripts?
>>>>>
>>>>>
>>>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>>>>>
>>>>>> I just tried setting both the env var and flag on the slaves, and
>>>>>> have determined that the env var is not present when it is being checked
>>>>>> src/docker/executor.cpp @ line 573:
>>>>>>
>>>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>>>>   string path =
>>>>>>>     envPath.isSome() ? envPath.get()
>>>>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" <<
>>>>>>> (envPath.isSome() ? "yes" : "no") << endl;
>>>>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>>>>>
>>>>>>
>>>>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly
>>>>>> propagated along up to the point of mesos-slave launch):
>>>>>>
>>>>>> $ cat /etc/default/mesos-slave
>>>>>>> export
>>>>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>>>>> export MESOS_PORT="5050"
>>>>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>>>>>
>>>>>>
>>>>>> TASK OUTPUT:
>>>>>>
>>>>>>
>>>>>>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR:
>>>>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'*
>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>> Launching health check process:
>>>>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>>>> --executor=(1)@192.168.225.59:44523
>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>>>> sh -c \" \/bin\/bash
>>>>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>>>>> Health check process launched at pid: 2519
>>>>>>
>>>>>>
>>>>>> The env var is not propagated when the docker executor is launched
>>>>>> in src/slave/containerizer/docker.cpp around line 903:
>>>>>>
>>>>>>   vector<string> argv;
>>>>>>>   argv.push_back("mesos-docker-executor");
>>>>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>>>>   // container (to distinguish it from Docker containers not created
>>>>>>>   // by Mesos).
>>>>>>>   Try<Subprocess> s = subprocess(
>>>>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>>>>       argv,
>>>>>>>       Subprocess::PIPE(),
>>>>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>>>>       environment,
>>>>>>>       lambda::bind(&setup, container->directory));
>>>>>>
>>>>>>
>>>>>> A little ways above we can see the environment is setup w/ the
>>>>>> container tasks defined env vars.
>>>>>>
>>>>>> See src/slave/containerizer/docker.cpp around line 871:
>>>>>>
>>>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>>>>   foreach (const Environment::Variable& variable,
>>>>>>>            container->executor.command().environment().variables()) {
>>>>>>>     environment[variable.name()] = variable.value();
>>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Should I file a JIRA for this?  Have I overlooked anything?
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote:
>>>>>>
>>>>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>>>>> 0.24.1 should be works.
>>>>>>>
>>>>>>> >Do any of you know which host the path
>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>> should exist on? It definitely doesn't exist on the slave, hence 
>>>>>>> execution
>>>>>>> failing.
>>>>>>>
>>>>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before?
>>>>>>> We got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use 
>>>>>>> the
>>>>>>> same dir of mesos-docker-executor.
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Maybe I spoke too soon.
>>>>>>>>
>>>>>>>> Now the checks are attempting to run, however the STDERR is not
>>>>>>>> looking good.  I've added some debugging to the error message output to
>>>>>>>> show the path, argv, and envp variables:
>>>>>>>>
>>>>>>>> STDOUT:
>>>>>>>>
>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" 
>>>>>>>>> --help="false"
>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" 
>>>>>>>>> --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --docker="docker" --docker_socket="/var/run/docker.sock" 
>>>>>>>>> --help="false"
>>>>>>>>> --initialize_driver_logging="true" --logbufsecs="0" 
>>>>>>>>> --logging_level="INFO"
>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>> Registered docker executor on mesos-worker2a
>>>>>>>>> Starting task
>>>>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>> Launching health check process:
>>>>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>>>> --executor=(1)@192.168.225.59:43917
>>>>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec
>>>>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>>>> sh -c \" exit 1
>>>>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>>>>> Health check process launched at pid: 3012
>>>>>>>>
>>>>>>>>
>>>>>>>> STDERR:
>>>>>>>>
>>>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on
>>>>>>>>> slave 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>> memory limited without swap.
>>>>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in
>>>>>>>>> childMain
>>>>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix 
>>>>>>>>> time)
>>>>>>>>> try "date -d @1444270649" if you are using GNU date ***
>>>>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700)
>>>>>>>>> from PID 3012; stack trace: ***
>>>>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>>>>> @ 0x4191e2 _Abort()
>>>>>>>>> @ 0x41921c _Abort()
>>>>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>>>>> @ 0x43cc9c
>>>>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>>>>> @ 0x7f4a39d92827
>>>>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>>>>>
>>>>>>>>
>>>>>>>> Do any of you know which host the path 
>>>>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>>>> should exist on? It definitely doesn't exist on the slave, hence
>>>>>>>> execution failing.
>>>>>>>>
>>>>>>>> This is with current master, git hash
>>>>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>>>>>
>>>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>>>>> Author: Anand Mazumdar <[email protected]>
>>>>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>>>>>
>>>>>>>>
>>>>>>>> -Jay
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Update:
>>>>>>>>>
>>>>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile
>>>>>>>>> and package the latest master (0.26.x) and deployed it to the 
>>>>>>>>> cluster, and
>>>>>>>>> now health checks are working as advertised in both Marathon and my 
>>>>>>>>> own
>>>>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>>>>>
>>>>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Jay
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Haosdent,
>>>>>>>>>>
>>>>>>>>>> Can you share your Marathon POST request that results in Mesos
>>>>>>>>>> executing the health checks?
>>>>>>>>>>
>>>>>>>>>> Since we can reference the Marathon framework, I've been doing
>>>>>>>>>> some digging around.
>>>>>>>>>>
>>>>>>>>>> Here are the details of my setup and findings:
>>>>>>>>>>
>>>>>>>>>> I put a few small hacks in Marathon:
>>>>>>>>>>
>>>>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's
>>>>>>>>>> dependencies
>>>>>>>>>>
>>>>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to
>>>>>>>>>> /tmp/X in both the TaskFactory as well an right before the task is 
>>>>>>>>>> sent to
>>>>>>>>>> Mesos via driver.launchTasks:
>>>>>>>>>>
>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>>>>>
>>>>>>>>>> $ git diff
>>>>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>>>>>
>>>>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>>>>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>> +        import java.io._
>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new
>>>>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>> +        bw.close()
>>>>>>>>>>>          CreatedTask(
>>>>>>>>>>>            taskInfo,
>>>>>>>>>>>            MarathonTasks.makeTask(
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>>>>>
>>>>>>>>>> $ git diff
>>>>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos:
>>>>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") {
>>>>>>>>>>> driver =>
>>>>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>>>>> +      var i = 0
>>>>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>>>>> +        import java.io._
>>>>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() +
>>>>>>>>>>> "-" + taskInfos(i).getTaskId.getValue)
>>>>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>>>>> +        bw.write("\n")
>>>>>>>>>>> +        bw.close()
>>>>>>>>>>> +      }
>>>>>>>>>>>        driver.launchTasks(Collections.singleton(offerID),
>>>>>>>>>>> taskInfos.asJava)
>>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then I built and deployed the hacked Marathon and restarted the
>>>>>>>>>> marathon service.
>>>>>>>>>>
>>>>>>>>>> Next I created the app via the Marathon API ("hello app" is a
>>>>>>>>>> container with a simple hello-world ruby app running on
>>>>>>>>>> 0.0.0.0:8000)
>>>>>>>>>>
>>>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST
>>>>>>>>>>> -H'Content-Type: application/json' -d'
>>>>>>>>>>> {
>>>>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>>>>   "apps": [
>>>>>>>>>>>     {
>>>>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>>>>       "container": {
>>>>>>>>>>>         "type": "DOCKER",
>>>>>>>>>>>         "docker": {
>>>>>>>>>>>           "image":
>>>>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>           "network": "BRIDGE",
>>>>>>>>>>>           "portMappings": [
>>>>>>>>>>>             {
>>>>>>>>>>>               "containerPort": 8000,
>>>>>>>>>>>               "hostPort": 0,
>>>>>>>>>>>               "protocol": "tcp"
>>>>>>>>>>>             }
>>>>>>>>>>>           ]
>>>>>>>>>>>         }
>>>>>>>>>>>       },
>>>>>>>>>>>       "env": {
>>>>>>>>>>>
>>>>>>>>>>>       },
>>>>>>>>>>>       "healthChecks": [
>>>>>>>>>>>         {
>>>>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>>>>         }
>>>>>>>>>>>       ],
>>>>>>>>>>>       "instances": 1,
>>>>>>>>>>>       "cpus": 1,
>>>>>>>>>>>       "mem": 512
>>>>>>>>>>>     }
>>>>>>>>>>>   ]
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> $ ls /tmp/
>>>>>>>>>>>
>>>>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>>
>>>>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do they match?
>>>>>>>>>>
>>>>>>>>>> $ md5sum /tmp/task*
>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>  
>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>> 1b5115997e78e2611654059249d99578
>>>>>>>>>>>  
>>>>>>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, so I am confident this is the information being sent across
>>>>>>>>>> the wire to Mesos.
>>>>>>>>>>
>>>>>>>>>> Do they contain any health-check information?
>>>>>>>>>>
>>>>>>>>>> $ cat
>>>>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>> {
>>>>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>
>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>   },
>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>   },
>>>>>>>>>>>   "resources":[
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>         "value":1.0
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     },
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>       "type":"SCALAR",
>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>         "value":512.0
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     },
>>>>>>>>>>>     {
>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>       "type":"RANGES",
>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>         "range":[
>>>>>>>>>>>           {
>>>>>>>>>>>             "begin":31641,
>>>>>>>>>>>             "end":31641
>>>>>>>>>>>           }
>>>>>>>>>>>         ]
>>>>>>>>>>>       },
>>>>>>>>>>>       "role":"*"
>>>>>>>>>>>     }
>>>>>>>>>>>   ],
>>>>>>>>>>>   "command":{
>>>>>>>>>>>     "environment":{
>>>>>>>>>>>       "variables":[
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT_8000",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"HOST",
>>>>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>>>>
>>>>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>>>>
>>>>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORTS",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>>>>         },
>>>>>>>>>>>         {
>>>>>>>>>>>           "name":"PORT0",
>>>>>>>>>>>           "value":"31641"
>>>>>>>>>>>         }
>>>>>>>>>>>       ]
>>>>>>>>>>>     },
>>>>>>>>>>>     "shell":false
>>>>>>>>>>>   },
>>>>>>>>>>>   "container":{
>>>>>>>>>>>     "type":"DOCKER",
>>>>>>>>>>>     "docker":{
>>>>>>>>>>>
>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>>>>       "network":"BRIDGE",
>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>         {
>>>>>>>>>>>           "host_port":31641,
>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>         }
>>>>>>>>>>>       ],
>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>     }
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> No, I don't see anything about any health check.
>>>>>>>>>>
>>>>>>>>>> Mesos STDOUT for the launched task:
>>>>>>>>>>
>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true"
>>>>>>>>>>> --logbufsecs="0" --logging_level="INFO"
>>>>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>>>>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>>>> --stop_timeout="0ns"
>>>>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>>>>> Starting task
>>>>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And STDERR:
>>>>>>>>>>
>>>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on
>>>>>>>>>>> slave 20150924-210922-1608624320-5050-1792-S1
>>>>>>>>>>> WARNING: Your kernel does not support swap limit capabilities,
>>>>>>>>>>> memory limited without swap.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Again, nothing about any health checks.
>>>>>>>>>>
>>>>>>>>>> Any ideas of other things to try or what I could be missing?
>>>>>>>>>> Can't say either way about the Mesos health-check system working or 
>>>>>>>>>> not if
>>>>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>>>>>
>>>>>>>>>> Thanks for all your help!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jay
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Maybe you could post your executor stdout/stderr so that we
>>>>>>>>>>> could know whether health check running not.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> marathon also use mesos health check. When I use health check,
>>>>>>>>>>>> I could saw the log like this in executor stdout.
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>>>>> Starting task
>>>>>>>>>>>> test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>>>>> Launching health check process:
>>>>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check 
>>>>>>>>>>>> --executor=xxxx
>>>>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>>>>> Received task health update, healthy: true
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am using my own framework, and the full task info I'm using
>>>>>>>>>>>>> is posted earlier in this thread.  Do you happen to know if 
>>>>>>>>>>>>> Marathon uses
>>>>>>>>>>>>> Mesos's health checks for its health check system?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, launch the health task through its definition in
>>>>>>>>>>>>> taskinfo. Do you launch your task through Marathon? I could test 
>>>>>>>>>>>>> it in my
>>>>>>>>>>>>> side.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you
>>>>>>>>>>>>>> or others confident health-checks are part of the code path when 
>>>>>>>>>>>>>> defined
>>>>>>>>>>>>>> via task info for docker container tasks?  Going through the 
>>>>>>>>>>>>>> code, I wasn't
>>>>>>>>>>>>>> able to find the linkage for anything other than health-checks 
>>>>>>>>>>>>>> triggered
>>>>>>>>>>>>>> through a custom executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With that being said it is a pretty good sized code base and
>>>>>>>>>>>>>> I'm not very familiar with it, so my analysis this far has by no 
>>>>>>>>>>>>>> means been
>>>>>>>>>>>>>> exhaustive.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When health check launch, it would have a log like this in
>>>>>>>>>>>>>> your executor stdout
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in
>>>>>>>>>>>>>>> the logs with the string "health" or "Health" if the 
>>>>>>>>>>>>>>> health-check were
>>>>>>>>>>>>>>> active?  None of my master or slave logs contain the string..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether
>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let
>>>>>>>>>>>>>>>>>> me double check.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll
>>>>>>>>>>>>>>>>>>> look there :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test
>>>>>>>>>>>>>>>>>>>> it out?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks
>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker 
>>>>>>>>>>>>>>>>>>>>> exec with the
>>>>>>>>>>>>>>>>>>>>> command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image
>>>>>>>>>>>>>>>>>>>>> tasks?  Mesos seems to be ignoring the 
>>>>>>>>>>>>>>>>>>>>> TaskInfo.HealthCheck field for me.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if
>>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but 
>>>>>>>>>>>>>>>>>>>>> have not found any
>>>>>>>>>>>>>>>>>>>>> indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked
>>>>>>>>>>>>>>>>>>>>> from src/launcher/executor.cpp 
>>>>>>>>>>>>>>>>>>>>> CommandExecutorProcess::launchTask.  Does
>>>>>>>>>>>>>>>>>>>>> this mean that health-checks are only supported for 
>>>>>>>>>>>>>>>>>>>>> custom executors and
>>>>>>>>>>>>>>>>>>>>> not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the
>>>>>>>>>>>>>>>>>>>>> 0/non-zero exit-status of a health-check command 
>>>>>>>>>>>>>>>>>>>>> translate to task health.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>

Reply via email to