One question for you haosdent-

You mentioned that the flags.launcher_dir should propagate to the docker 
executor all the way up the chain.  Can you show me where this logic is in the 
codebase?  I didn't see where that was happening and would like to understand 
the mechanism.

Thanks!
Jay



> On Oct 8, 2015, at 8:29 PM, Jay Taylor <[email protected]> wrote:
> 
> Maybe tomorrow I will build a fresh cluster from scratch to see if the broken 
> behavior experienced today still persists.
> 
>> On Oct 8, 2015, at 7:52 PM, haosdent <[email protected]> wrote:
>> 
>> As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir 
>> which would find mesos-docker-executor and mesos-health-check under this 
>> dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still works 
>> because flags.launcher_dir is get from it.
>> 
>> For example, because I 
>> ```
>> export MESOS_LAUNCHER_DIR=/tmp
>> ```
>> before start mesos-slave. So when I launch slave, I could find this log in 
>> slave log
>> ```
>> I1009 10:27:26.594599  1416 slave.cpp:203] Flags at startup: xxxxx  
>> --launcher_dir="/tmp"
>> ```
>> 
>> And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox 
>> dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts?
>> 
>> 
>>> On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <[email protected]> wrote:
>>> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before.
>>> 
>>> I just tried setting both the env var and flag on the slaves, and have 
>>> determined that the env var is not present when it is being checked 
>>> src/docker/executor.cpp @ line 573:
>>> 
>>>>  const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
>>>>   string path =
>>>>     envPath.isSome() ? envPath.get()
>>>>                      : os::realpath(Path(argv[0]).dirname()).get();
>>>>   cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() ? 
>>>> "yes" : "no") << endl;
>>>>   cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl;
>>> 
>>> 
>>> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly 
>>> propagated along up to the point of mesos-slave launch):
>>> 
>>>> $ cat /etc/default/mesos-slave
>>>> export 
>>>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos"
>>>> export MESOS_CONTAINERIZERS="mesos,docker"
>>>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins"
>>>> export MESOS_PORT="5050"
>>>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos"
>>> 
>>> TASK OUTPUT:
>>> 
>>>> MESOS_LAUNCHER_DIR: envpath.isSome()->no
>>>> MESOS_LAUNCHER_DIR: 
>>>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'
>>>> Registered docker executor on mesos-worker2a
>>>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Launching health check process: 
>>>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check
>>>>  --executor=(1)@192.168.225.59:44523 
>>>> --health_check_json={"command":{"shell":true,"value":"docker exec 
>>>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad
>>>>  sh -c \" \/bin\/bash 
>>>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>  --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253
>>>> Health check process launched at pid: 2519
>>> 
>>> 
>>> The env var is not propagated when the docker executor is launched in 
>>> src/slave/containerizer/docker.cpp around line 903:
>>> 
>>>>   vector<string> argv;
>>>>   argv.push_back("mesos-docker-executor");
>>>>   // Construct the mesos-docker-executor using the "name" we gave the
>>>>   // container (to distinguish it from Docker containers not created
>>>>   // by Mesos).
>>>>   Try<Subprocess> s = subprocess(
>>>>       path::join(flags.launcher_dir, "mesos-docker-executor"),
>>>>       argv,
>>>>       Subprocess::PIPE(),
>>>>       Subprocess::PATH(path::join(container->directory, "stdout")),
>>>>       Subprocess::PATH(path::join(container->directory, "stderr")),
>>>>       dockerFlags(flags, container->name(), container->directory),
>>>>       environment,
>>>>       lambda::bind(&setup, container->directory));
>>> 
>>> 
>>> A little ways above we can see the environment is setup w/ the container 
>>> tasks defined env vars.
>>> 
>>> See src/slave/containerizer/docker.cpp around line 871:
>>> 
>>>>   // Include any enviroment variables from ExecutorInfo.
>>>>   foreach (const Environment::Variable& variable,
>>>>            container->executor.command().environment().variables()) {
>>>>     environment[variable.name()] = variable.value();
>>>>   }
>>> 
>>> 
>>> Should I file a JIRA for this?  Have I overlooked anything?
>>> 
>>> 
>>>> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <[email protected]> wrote:
>>>> >Not sure what was going on with health-checks in 0.24.0.
>>>> 0.24.1 should be works.
>>>> 
>>>> >Do any of you know which host the path 
>>>> >"/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>> > should exist on? It definitely doesn't exist on the slave, hence 
>>>> >execution failing.
>>>> 
>>>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We got 
>>>> mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the same 
>>>> dir of mesos-docker-executor. 
>>>> 
>>>>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <[email protected]> wrote:
>>>>> Maybe I spoke too soon.
>>>>> 
>>>>> Now the checks are attempting to run, however the STDERR is not looking 
>>>>> good.  I've added some debugging to the error message output to show the 
>>>>> path, argv, and envp variables:
>>>>> 
>>>>> STDOUT:
>>>>> 
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" 
>>>>>> --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" 
>>>>>> --quiet="false" 
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>  --stop_timeout="0ns"
>>>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
>>>>>> --initialize_driver_logging="true" --logbufsecs="0" 
>>>>>> --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" 
>>>>>> --quiet="false" 
>>>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc"
>>>>>>  --stop_timeout="0ns"
>>>>>> Registered docker executor on mesos-worker2a
>>>>>> Starting task 
>>>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Launching health check process: 
>>>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check
>>>>>>  --executor=(1)@192.168.225.59:43917 
>>>>>> --health_check_json={"command":{"shell":true,"value":"docker exec 
>>>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc
>>>>>>  sh -c \" exit 1 
>>>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0}
>>>>>>  
>>>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0
>>>>>> Health check process launched at pid: 3012
>>>>> 
>>>>> 
>>>>> STDERR:
>>>>> 
>>>>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0
>>>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave 
>>>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1
>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory 
>>>>>> limited without swap.
>>>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain 
>>>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>  
>>>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check',
>>>>>>  envp=''): No such file or directory*** Aborted at 1444270649 (unix 
>>>>>> time) try "date -d @1444270649" if you are using GNU date ***
>>>>>> PC: @ 0x7f4a37ec6cc9 (unknown)
>>>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from PID 
>>>>>> 3012; stack trace: ***
>>>>>> @ 0x7f4a38265340 (unknown)
>>>>>> @ 0x7f4a37ec6cc9 (unknown)
>>>>>> @ 0x7f4a37eca0d8 (unknown)
>>>>>> @ 0x4191e2 _Abort()
>>>>>> @ 0x41921c _Abort()
>>>>>> @ 0x7f4a39dc2768 process::childMain()
>>>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke()
>>>>>> @ 0x7f4a39dc24fc process::defaultClone()
>>>>>> @ 0x7f4a39dc34fb process::subprocess()
>>>>>> @ 0x43cc9c 
>>>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
>>>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume()
>>>>>> @ 0x7f4a39d92827 
>>>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
>>>>>> @ 0x7f4a38a47e40 (unknown)
>>>>>> @ 0x7f4a3825d182 start_thread
>>>>>> @ 0x7f4a37f8a47d (unknown)
>>>>> 
>>>>> Do any of you know which host the path 
>>>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check"
>>>>>  should exist on? It definitely doesn't exist on the slave, hence 
>>>>> execution failing.
>>>>> 
>>>>> This is with current master, git hash 
>>>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1.
>>>>> 
>>>>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1
>>>>>> Author: Anand Mazumdar <[email protected]>
>>>>>> Date:   Tue Oct 6 17:37:41 2015 -0700
>>>>> 
>>>>> 
>>>>> -Jay
>>>>> 
>>>>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <[email protected]> wrote:
>>>>>> Update:
>>>>>> 
>>>>>> I used https://github.com/deric/mesos-deb-packaging to compile and 
>>>>>> package the latest master (0.26.x) and deployed it to the cluster, and 
>>>>>> now health checks are working as advertised in both Marathon and my own 
>>>>>> framework!  Not sure what was going on with health-checks in 0.24.0..
>>>>>> 
>>>>>> Anyways, thanks again for your help Haosdent!
>>>>>> 
>>>>>> Cheers,
>>>>>> Jay
>>>>>> 
>>>>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <[email protected]> wrote:
>>>>>>> Hi Haosdent,
>>>>>>> 
>>>>>>> Can you share your Marathon POST request that results in Mesos 
>>>>>>> executing the health checks?
>>>>>>> 
>>>>>>> Since we can reference the Marathon framework, I've been doing some 
>>>>>>> digging around.
>>>>>>> 
>>>>>>> Here are the details of my setup and findings:
>>>>>>> 
>>>>>>> I put a few small hacks in Marathon:
>>>>>>> 
>>>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
>>>>>>> 
>>>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X 
>>>>>>> in both the TaskFactory as well an right before the task is sent to 
>>>>>>> Mesos via driver.launchTasks:
>>>>>>> 
>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
>>>>>>> 
>>>>>>>> $ git diff 
>>>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>>>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>>>>>>> 
>>>>>>>>      new TaskBuilder(app, taskIdUtil.newTaskId, 
>>>>>>>> config).buildIfMatches(offer, runningTasks).map {
>>>>>>>>        case (taskInfo, ports) =>
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(new 
>>>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfo))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>>          CreatedTask(
>>>>>>>>            taskInfo,
>>>>>>>>            MarathonTasks.makeTask(
>>>>>>> 
>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
>>>>>>> 
>>>>>>>> $ git diff 
>>>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>>>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>>>>>>>    override def launchTasks(offerID: OfferID, taskInfos: 
>>>>>>>> Seq[TaskInfo]): Boolean = {
>>>>>>>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>>>>>>>        import scala.collection.JavaConverters._
>>>>>>>> +      var i = 0
>>>>>>>> +      for (i <- 0 to taskInfos.length - 1) {
>>>>>>>> +        import com.googlecode.protobuf.format.JsonFormat
>>>>>>>> +        import java.io._
>>>>>>>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" + 
>>>>>>>> taskInfos(i).getTaskId.getValue)
>>>>>>>> +        val bw = new BufferedWriter(new FileWriter(file))
>>>>>>>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>>>>>>>> +        bw.write("\n")
>>>>>>>> +        bw.close()
>>>>>>>> +      }
>>>>>>>>        driver.launchTasks(Collections.singleton(offerID), 
>>>>>>>> taskInfos.asJava)
>>>>>>>>      }
>>>>>>> 
>>>>>>> 
>>>>>>> Then I built and deployed the hacked Marathon and restarted the 
>>>>>>> marathon service.
>>>>>>> 
>>>>>>> Next I created the app via the Marathon API ("hello app" is a container 
>>>>>>> with a simple hello-world ruby app running on 0.0.0.0:8000)
>>>>>>> 
>>>>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: 
>>>>>>>> application/json' -d'
>>>>>>>> {
>>>>>>>>   "id": "/app-81-1-hello-app",
>>>>>>>>   "apps": [
>>>>>>>>     {
>>>>>>>>       "id": "/app-81-1-hello-app/web-v11",
>>>>>>>>       "container": {
>>>>>>>>         "type": "DOCKER",
>>>>>>>>         "docker": {
>>>>>>>>           "image": 
>>>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>           "network": "BRIDGE",
>>>>>>>>           "portMappings": [
>>>>>>>>             {
>>>>>>>>               "containerPort": 8000,
>>>>>>>>               "hostPort": 0,
>>>>>>>>               "protocol": "tcp"
>>>>>>>>             }
>>>>>>>>           ]
>>>>>>>>         }
>>>>>>>>       },
>>>>>>>>       "env": {
>>>>>>>>         
>>>>>>>>       },
>>>>>>>>       "healthChecks": [
>>>>>>>>         {
>>>>>>>>           "protocol": "COMMAND",
>>>>>>>>           "command": {"value": "exit 1"},
>>>>>>>>           "gracePeriodSeconds": 10,
>>>>>>>>           "intervalSeconds": 10,
>>>>>>>>           "timeoutSeconds": 10,
>>>>>>>>           "maxConsecutiveFailures": 3
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "instances": 1,
>>>>>>>>       "cpus": 1,
>>>>>>>>       "mem": 512
>>>>>>>>     }
>>>>>>>>   ]
>>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>>> $ ls /tmp/
>>>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> Do they match?
>>>>>>> 
>>>>>>>> $ md5sum /tmp/task*
>>>>>>>> 1b5115997e78e2611654059249d99578  
>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> 1b5115997e78e2611654059249d99578  
>>>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> Yes, so I am confident this is the information being sent across the 
>>>>>>> wire to Mesos.
>>>>>>> 
>>>>>>> Do they contain any health-check information?
>>>>>>> 
>>>>>>>> $ cat 
>>>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>>> {
>>>>>>>>   "name":"web-v11.app-81-1-hello-app",
>>>>>>>>   "task_id":{
>>>>>>>>     
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>   },
>>>>>>>>   "slave_id":{
>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>   },
>>>>>>>>   "resources":[
>>>>>>>>     {
>>>>>>>>       "name":"cpus",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":1.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"mem",
>>>>>>>>       "type":"SCALAR",
>>>>>>>>       "scalar":{
>>>>>>>>         "value":512.0
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     },
>>>>>>>>     {
>>>>>>>>       "name":"ports",
>>>>>>>>       "type":"RANGES",
>>>>>>>>       "ranges":{
>>>>>>>>         "range":[
>>>>>>>>           {
>>>>>>>>             "begin":31641,
>>>>>>>>             "end":31641
>>>>>>>>           }
>>>>>>>>         ]
>>>>>>>>       },
>>>>>>>>       "role":"*"
>>>>>>>>     }
>>>>>>>>   ],
>>>>>>>>   "command":{
>>>>>>>>     "environment":{
>>>>>>>>       "variables":[
>>>>>>>>         {
>>>>>>>>           "name":"PORT_8000",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_VERSION",
>>>>>>>>           "value":"2015-10-07T19:35:08.386Z"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"HOST",
>>>>>>>>           "value":"mesos-worker1a"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>>>>>>>>           
>>>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MESOS_TASK_ID",
>>>>>>>>           
>>>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORTS",
>>>>>>>>           "value":"31641"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"MARATHON_APP_ID",
>>>>>>>>           "value":"/app-81-1-hello-app/web-v11"
>>>>>>>>         },
>>>>>>>>         {
>>>>>>>>           "name":"PORT0",
>>>>>>>>           "value":"31641"
>>>>>>>>         }
>>>>>>>>       ]
>>>>>>>>     },
>>>>>>>>     "shell":false
>>>>>>>>   },
>>>>>>>>   "container":{
>>>>>>>>     "type":"DOCKER",
>>>>>>>>     "docker":{
>>>>>>>>       
>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>>>>>>>       "network":"BRIDGE",
>>>>>>>>       "port_mappings":[
>>>>>>>>         {
>>>>>>>>           "host_port":31641,
>>>>>>>>           "container_port":8000,
>>>>>>>>           "protocol":"tcp"
>>>>>>>>         }
>>>>>>>>       ],
>>>>>>>>       "privileged":false,
>>>>>>>>       "force_pull_image":false
>>>>>>>>     }
>>>>>>>>   }
>>>>>>>> }
>>>>>>> 
>>>>>>> No, I don't see anything about any health check.
>>>>>>> 
>>>>>>> Mesos STDOUT for the launched task:
>>>>>>> 
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>  --docker="docker" --help="false" --initialize_driver_logging="true" 
>>>>>>>> --logbufsecs="0" --logging_level="INFO" 
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>  --stop_timeout="0ns"
>>>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>  --docker="docker" --help="false" --initialize_driver_logging="true" 
>>>>>>>> --logbufsecs="0" --logging_level="INFO" 
>>>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
>>>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>>>>>>>>  --stop_timeout="0ns"
>>>>>>>> Registered docker executor on mesos-worker1a
>>>>>>>> Starting task 
>>>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>>>>>>> 
>>>>>>> 
>>>>>>> And STDERR:
>>>>>>> 
>>>>>>>> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>>>>>>>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave 
>>>>>>>> 20150924-210922-1608624320-5050-1792-S1
>>>>>>>> WARNING: Your kernel does not support swap limit capabilities, memory 
>>>>>>>> limited without swap.
>>>>>>> 
>>>>>>> 
>>>>>>> Again, nothing about any health checks.
>>>>>>> 
>>>>>>> Any ideas of other things to try or what I could be missing?  Can't say 
>>>>>>> either way about the Mesos health-check system working or not if 
>>>>>>> Marathon won't put the health-check into the task it sends to Mesos.
>>>>>>> 
>>>>>>> Thanks for all your help!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jay
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <[email protected]> wrote:
>>>>>>>> Maybe you could post your executor stdout/stderr so that we could know 
>>>>>>>> whether health check running not.
>>>>>>>> 
>>>>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <[email protected]> wrote:
>>>>>>>>> marathon also use mesos health check. When I use health check, I 
>>>>>>>>> could saw the log like this in executor stdout.
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> Registered docker executor on xxxxx
>>>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>>>>>>>> Launching health check process: 
>>>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check 
>>>>>>>>> --executor=xxxx
>>>>>>>>> Health check process launched at pid: 9895
>>>>>>>>> Received task health update, healthy: true
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>> I am using my own framework, and the full task info I'm using is 
>>>>>>>>>> posted earlier in this thread.  Do you happen to know if Marathon 
>>>>>>>>>> uses Mesos's health checks for its health check system?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do 
>>>>>>>>>>> you launch your task through Marathon? I could test it in my side.
>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <[email protected]> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> Precisely, and there are none of those statements.  Are you or 
>>>>>>>>>>>> others confident health-checks are part of the code path when 
>>>>>>>>>>>> defined via task info for docker container tasks?  Going through 
>>>>>>>>>>>> the code, I wasn't able to find the linkage for anything other 
>>>>>>>>>>>> than health-checks triggered through a custom executor.
>>>>>>>>>>>> 
>>>>>>>>>>>> With that being said it is a pretty good sized code base and I'm 
>>>>>>>>>>>> not very familiar with it, so my analysis this far has by no means 
>>>>>>>>>>>> been exhaustive.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <[email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When health check launch, it would have a log like this in your 
>>>>>>>>>>>>> executor stdout
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> Health check process launched at pid xxx
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the 
>>>>>>>>>>>>>> logs with the string "health" or "Health" if the health-check 
>>>>>>>>>>>>>> were active?  None of my master or slave logs contain the 
>>>>>>>>>>>>>> string..
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <[email protected]> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether 
>>>>>>>>>>>>>>> could see unhealthy status in your task stdout/stderr.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor 
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> My current version is 0.24.1.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <[email protected]> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>>>>>>>>>>> Are you use one of this version?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent 
>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me 
>>>>>>>>>>>>>>>>>> double check.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor 
>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look 
>>>>>>>>>>>>>>>>>>> there :)
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks again!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor 
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it 
>>>>>>>>>>>>>>>>>>>> out?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen 
>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Jay, 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> We just added health check support for docker tasks 
>>>>>>>>>>>>>>>>>>>>> that's in master but not yet released. It will run docker 
>>>>>>>>>>>>>>>>>>>>> exec with the command you provided as health checks.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Tim
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor 
>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks? 
>>>>>>>>>>>>>>>>>>>>>>  Mesos seems to be ignoring the TaskInfo.HealthCheck 
>>>>>>>>>>>>>>>>>>>>>> field for me.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     {
>>>>>>>>>>>>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>>>>>>>>>>>>           {
>>>>>>>>>>>>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>>>>>>>>>>>>           }
>>>>>>>>>>>>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>>>>>>>>>>>>       
>>>>>>>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>>>>>>>>>>>>         {
>>>>>>>>>>>>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   },
>>>>>>>>>>>>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I have searched all machines and containers to see if 
>>>>>>>>>>>>>>>>>>>>>> they ever run the command (in this case `sleep 5`), but 
>>>>>>>>>>>>>>>>>>>>>> have not found any indication that it is being executed.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from 
>>>>>>>>>>>>>>>>>>>>>> src/launcher/executor.cpp 
>>>>>>>>>>>>>>>>>>>>>> CommandExecutorProcess::launchTask.  Does this mean that 
>>>>>>>>>>>>>>>>>>>>>> health-checks are only supported for custom executors 
>>>>>>>>>>>>>>>>>>>>>> and not for docker tasks?
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero 
>>>>>>>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task 
>>>>>>>>>>>>>>>>>>>>>> health.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>> Haosdent Huang
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -- 
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Haosdent Huang
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Best Regards,
>>>> Haosdent Huang
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> Haosdent Huang

Reply via email to