Re: mesos container cluster came across health check coredump log
Really apologize, I am in China and could not connect VPN recent days. Would check as soon as possible once back. On Fri, Mar 31, 2017 at 4:20 PM, Alex Rukletsov wrote: > Cool, looking forward to it! > > On Fri, Mar 31, 2017 at 4:30 AM, tommy xiao wrote: > >> Alex,Yes, let me have a try. >> >> 2017-03-31 3:16 GMT+08:00 Alex Rukletsov : >> >>> This is https://issues.apache.org/jira/browse/MESOS-7210. Deshi, do you >>> want to send the patch? I or Haosdent can shepherd. >>> >>> A. >>> >>> On Thu, Mar 30, 2017 at 12:27 PM, tommy xiao wrote: >>> interesting for the specified case. 2017-03-30 7:52 GMT+08:00 Jie Yu : > + AlexR, haosdent > > For posterity, the root cause of this problem is that when agent is > running inside a docker container and `--docker_mesos_image` flag is > specified, the pid namespace of the executor container (which initiate the > health check) is different than the root pid namespace. Therefore, getting > the network namespace handle using `/proc//ns/net` does not work > because the 'pid' here is in the root pid namespace (reported by docker > daemon). > > Alex and haosdent, I think we should fix this issue. As suggested > above, we can launch the executor container with --pid=host if > `--docker_mesos_image` is specified. > > - Jie > > On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: > >> it resolved by add --pid=host. thanks for community guys supports. >> thanks a lot. >> >> 2017-03-29 9:52 GMT+08:00 tommy xiao : >> >>> My Environment is specified: >>> >>> mesos 1.2 in docker containerized. >>> >>> send a sample nginx docker container with mesos native health check. >>> >>> then get sandbox core dump. >>> >>> i have digg into more information for your reference: >>> >>> in mesos slave container, i can only see task container pid. but i >>> can't found process nginx pid. >>> >>> but in host console, i can found the nginx pid. so how can i get the >>> pid in container? >>> >>> >>> >>> >>> 2017-03-28 13:49 GMT+08:00 tommy xiao : >>> https://issues.apache.org/jira/browse/MESOS-6184 anyone give some hint? ``` I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4- dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as health check still in grace period W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist - - - Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::Healt hCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::Healt hCheckerProcess::perform
Re: mesos container cluster came across health check coredump log
Cool, looking forward to it! On Fri, Mar 31, 2017 at 4:30 AM, tommy xiao wrote: > Alex,Yes, let me have a try. > > 2017-03-31 3:16 GMT+08:00 Alex Rukletsov : > >> This is https://issues.apache.org/jira/browse/MESOS-7210. Deshi, do you >> want to send the patch? I or Haosdent can shepherd. >> >> A. >> >> On Thu, Mar 30, 2017 at 12:27 PM, tommy xiao wrote: >> >>> interesting for the specified case. >>> >>> 2017-03-30 7:52 GMT+08:00 Jie Yu : >>> + AlexR, haosdent For posterity, the root cause of this problem is that when agent is running inside a docker container and `--docker_mesos_image` flag is specified, the pid namespace of the executor container (which initiate the health check) is different than the root pid namespace. Therefore, getting the network namespace handle using `/proc//ns/net` does not work because the 'pid' here is in the root pid namespace (reported by docker daemon). Alex and haosdent, I think we should fix this issue. As suggested above, we can launch the executor container with --pid=host if `--docker_mesos_image` is specified. - Jie On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: > it resolved by add --pid=host. thanks for community guys supports. > thanks a lot. > > 2017-03-29 9:52 GMT+08:00 tommy xiao : > >> My Environment is specified: >> >> mesos 1.2 in docker containerized. >> >> send a sample nginx docker container with mesos native health check. >> >> then get sandbox core dump. >> >> i have digg into more information for your reference: >> >> in mesos slave container, i can only see task container pid. but i >> can't found process nginx pid. >> >> but in host console, i can found the nginx pid. so how can i get the >> pid in container? >> >> >> >> >> 2017-03-28 13:49 GMT+08:00 tommy xiao : >> >>> https://issues.apache.org/jira/browse/MESOS-6184 >>> >>> anyone give some hint? >>> >>> ``` >>> >>> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 >>> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent >>> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 >>> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H >>> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 >>> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e >>> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a >>> 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b >>> 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox >>> --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest >>> --label=APP_ID=hc --label=VCLUSTER=clusterautotest >>> --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 >>> --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4- >>> dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx >>> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as >>> health check still in grace period >>> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed >>> 1 times consecutively: HTTP health check failed: curl returned >>> terminated >>> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ >>> include/process/posix/subprocess.hpp:190): Failed to execute >>> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: >>> Pid >>> 18596 does not exist >>> >>>- >>> - >>> - Aborted at 1490672906 (unix time) try "date -d >>> @1490672906" if you are using GNU date *** >>> PC: @ 0x7f26bfb485f7 __GI_raise >>> - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) >>> from PID 74; stack trace: *** >>> @ 0x7f26c0703100 (unknown) >>> @ 0x7f26bfb485f7 __GI_raise >>> @ 0x7f26bfb49ce8 __GI_abort >>> @ 0x7f26c315778e _Abort() >>> @ 0x7f26c31577cc _Abort() >>> @ 0x7f26c237a4b6 process::internal::childMain() >>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>> @ 0x7f26c2379e53 process::internal::defaultClone() >>> @ 0x7f26c237b951 process::internal::cloneChild() >>> @ 0x7f26c237954f process::subprocess() >>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>> hCheckerProcess::httpHealthCheck() >>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>> hCheckerProcess::performSingleCheck() >>> @ 0x7f26c2331389 process::ProcessManager::resume() >>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>> impleIFZN7process14ProcessMana >>> ger12init_threadsEvEUt_vEEE6_M_runEv >>> @ 0x7f26c04a
Re: mesos container cluster came across health check coredump log
Alex,Yes, let me have a try. 2017-03-31 3:16 GMT+08:00 Alex Rukletsov : > This is https://issues.apache.org/jira/browse/MESOS-7210. Deshi, do you > want to send the patch? I or Haosdent can shepherd. > > A. > > On Thu, Mar 30, 2017 at 12:27 PM, tommy xiao wrote: > >> interesting for the specified case. >> >> 2017-03-30 7:52 GMT+08:00 Jie Yu : >> >>> + AlexR, haosdent >>> >>> For posterity, the root cause of this problem is that when agent is >>> running inside a docker container and `--docker_mesos_image` flag is >>> specified, the pid namespace of the executor container (which initiate the >>> health check) is different than the root pid namespace. Therefore, getting >>> the network namespace handle using `/proc//ns/net` does not work >>> because the 'pid' here is in the root pid namespace (reported by docker >>> daemon). >>> >>> Alex and haosdent, I think we should fix this issue. As suggested above, >>> we can launch the executor container with --pid=host if >>> `--docker_mesos_image` is specified. >>> >>> - Jie >>> >>> On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: >>> it resolved by add --pid=host. thanks for community guys supports. thanks a lot. 2017-03-29 9:52 GMT+08:00 tommy xiao : > My Environment is specified: > > mesos 1.2 in docker containerized. > > send a sample nginx docker container with mesos native health check. > > then get sandbox core dump. > > i have digg into more information for your reference: > > in mesos slave container, i can only see task container pid. but i > can't found process nginx pid. > > but in host console, i can found the nginx pid. so how can i get the > pid in container? > > > > > 2017-03-28 13:49 GMT+08:00 tommy xiao : > >> https://issues.apache.org/jira/browse/MESOS-6184 >> >> anyone give some hint? >> >> ``` >> >> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 >> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent >> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 >> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H >> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 >> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e >> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a >> 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b >> 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox >> --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest >> --label=APP_ID=hc --label=VCLUSTER=clusterautotest >> --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 >> --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4- >> dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx >> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as >> health check still in grace period >> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed >> 1 times consecutively: HTTP health check failed: curl returned terminated >> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ >> include/process/posix/subprocess.hpp:190): Failed to execute >> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: >> Pid >> 18596 does not exist >> >>- >> - >> - Aborted at 1490672906 (unix time) try "date -d >> @1490672906" if you are using GNU date *** >> PC: @ 0x7f26bfb485f7 __GI_raise >> - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) >> from PID 74; stack trace: *** >> @ 0x7f26c0703100 (unknown) >> @ 0x7f26bfb485f7 __GI_raise >> @ 0x7f26bfb49ce8 __GI_abort >> @ 0x7f26c315778e _Abort() >> @ 0x7f26c31577cc _Abort() >> @ 0x7f26c237a4b6 process::internal::childMain() >> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >> @ 0x7f26c2379e53 process::internal::defaultClone() >> @ 0x7f26c237b951 process::internal::cloneChild() >> @ 0x7f26c237954f process::subprocess() >> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >> hCheckerProcess::httpHealthCheck() >> @ 0x7f26c15ababd mesos::internal::checks::Healt >> hCheckerProcess::performSingleCheck() >> @ 0x7f26c2331389 process::ProcessManager::resume() >> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >> impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M >> _runEv >> @ 0x7f26c04a1220 (unknown) >> @ 0x7f26c06fbdc5 start_thread >> @ 0x7f26bfc0928d __clone >> W0328 11:48:36.340055 55 health_checker.cpp:202] Health >> check failed 2 time
Re: mesos container cluster came across health check coredump log
This is https://issues.apache.org/jira/browse/MESOS-7210. Deshi, do you want to send the patch? I or Haosdent can shepherd. A. On Thu, Mar 30, 2017 at 12:27 PM, tommy xiao wrote: > interesting for the specified case. > > 2017-03-30 7:52 GMT+08:00 Jie Yu : > >> + AlexR, haosdent >> >> For posterity, the root cause of this problem is that when agent is >> running inside a docker container and `--docker_mesos_image` flag is >> specified, the pid namespace of the executor container (which initiate the >> health check) is different than the root pid namespace. Therefore, getting >> the network namespace handle using `/proc//ns/net` does not work >> because the 'pid' here is in the root pid namespace (reported by docker >> daemon). >> >> Alex and haosdent, I think we should fix this issue. As suggested above, >> we can launch the executor container with --pid=host if >> `--docker_mesos_image` is specified. >> >> - Jie >> >> On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: >> >>> it resolved by add --pid=host. thanks for community guys supports. >>> thanks a lot. >>> >>> 2017-03-29 9:52 GMT+08:00 tommy xiao : >>> My Environment is specified: mesos 1.2 in docker containerized. send a sample nginx docker container with mesos native health check. then get sandbox core dump. i have digg into more information for your reference: in mesos slave container, i can only see task container pid. but i can't found process nginx pid. but in host console, i can found the nginx pid. so how can i get the pid in container? 2017-03-28 13:49 GMT+08:00 tommy xiao : > https://issues.apache.org/jira/browse/MESOS-6184 > > anyone give some hint? > > ``` > > I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 > I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent > a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 > I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 > --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e > 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a > 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b > 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox > --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest > --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu > --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp > --name > mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb > nginx > I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as > health check still in grace period > W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 > times consecutively: HTTP health check failed: curl returned terminated > with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ > include/process/posix/subprocess.hpp:190): Failed to execute > Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid > 18596 does not exist > >- > - > - Aborted at 1490672906 (unix time) try "date -d > @1490672906" if you are using GNU date *** > PC: @ 0x7f26bfb485f7 __GI_raise > - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) > from PID 74; stack trace: *** > @ 0x7f26c0703100 (unknown) > @ 0x7f26bfb485f7 __GI_raise > @ 0x7f26bfb49ce8 __GI_abort > @ 0x7f26c315778e _Abort() > @ 0x7f26c31577cc _Abort() > @ 0x7f26c237a4b6 process::internal::childMain() > @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() > @ 0x7f26c2379e53 process::internal::defaultClone() > @ 0x7f26c237b951 process::internal::cloneChild() > @ 0x7f26c237954f process::subprocess() > @ 0x7f26c15a9fb1 mesos::internal::checks::Healt > hCheckerProcess::httpHealthCheck() > @ 0x7f26c15ababd mesos::internal::checks::Healt > hCheckerProcess::performSingleCheck() > @ 0x7f26c2331389 process::ProcessManager::resume() > @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s > impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M > _runEv > @ 0x7f26c04a1220 (unknown) > @ 0x7f26c06fbdc5 start_thread > @ 0x7f26bfc0928d __clone > W0328 11:48:36.340055 55 health_checker.cpp:202] Health > check failed 2 times consecutively: HTTP health check failed: > curl returned > terminated with signal Aborted (core dumped): ABORT: > > (../../../3rdparty/libprocess/include/pr
Re: mesos container cluster came across health check coredump log
interesting for the specified case. 2017-03-30 7:52 GMT+08:00 Jie Yu : > + AlexR, haosdent > > For posterity, the root cause of this problem is that when agent is > running inside a docker container and `--docker_mesos_image` flag is > specified, the pid namespace of the executor container (which initiate the > health check) is different than the root pid namespace. Therefore, getting > the network namespace handle using `/proc//ns/net` does not work > because the 'pid' here is in the root pid namespace (reported by docker > daemon). > > Alex and haosdent, I think we should fix this issue. As suggested above, > we can launch the executor container with --pid=host if > `--docker_mesos_image` is specified. > > - Jie > > On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: > >> it resolved by add --pid=host. thanks for community guys supports. >> thanks a lot. >> >> 2017-03-29 9:52 GMT+08:00 tommy xiao : >> >>> My Environment is specified: >>> >>> mesos 1.2 in docker containerized. >>> >>> send a sample nginx docker container with mesos native health check. >>> >>> then get sandbox core dump. >>> >>> i have digg into more information for your reference: >>> >>> in mesos slave container, i can only see task container pid. but i can't >>> found process nginx pid. >>> >>> but in host console, i can found the nginx pid. so how can i get the pid >>> in container? >>> >>> >>> >>> >>> 2017-03-28 13:49 GMT+08:00 tommy xiao : >>> https://issues.apache.org/jira/browse/MESOS-6184 anyone give some hint? ``` I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as health check still in grace period W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist - - - Aborted at 1490672906 (unix time) try "date -d @1490672906" if you are using GNU date *** PC: @ 0x7f26bfb485f7 __GI_raise - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from PID 74; stack trace: *** @ 0x7f26c0703100 (unknown) @ 0x7f26bfb485f7 __GI_raise @ 0x7f26bfb49ce8 __GI_abort @ 0x7f26c315778e _Abort() @ 0x7f26c31577cc _Abort() @ 0x7f26c237a4b6 process::internal::childMain() @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() @ 0x7f26c2379e53 process::internal::defaultClone() @ 0x7f26c237b951 process::internal::cloneChild() @ 0x7f26c237954f process::subprocess() @ 0x7f26c15a9fb1 mesos::internal::checks::Healt hCheckerProcess::httpHealthCheck() @ 0x7f26c15ababd mesos::internal::checks::Healt hCheckerProcess::performSingleCheck() @ 0x7f26c2331389 process::ProcessManager::resume() @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M _runEv @ 0x7f26c04a1220 (unknown) @ 0x7f26c06fbdc5 start_thread @ 0x7f26bfc0928d __clone W0328 11:48:36.340055 55 health_checker.cpp:202] Health check failed 2 times consecutively: HTTP health check failed: curl returned terminated with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): Failed to execute Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid 18596 does not exist - Aborted at 1490672916 (unix time) try "date -d @1490672916" if you are using GNU date
Re: mesos container cluster came across health check coredump log
+ AlexR, haosdent For posterity, the root cause of this problem is that when agent is running inside a docker container and `--docker_mesos_image` flag is specified, the pid namespace of the executor container (which initiate the health check) is different than the root pid namespace. Therefore, getting the network namespace handle using `/proc//ns/net` does not work because the 'pid' here is in the root pid namespace (reported by docker daemon). Alex and haosdent, I think we should fix this issue. As suggested above, we can launch the executor container with --pid=host if `--docker_mesos_image` is specified. - Jie On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao wrote: > it resolved by add --pid=host. thanks for community guys supports. thanks > a lot. > > 2017-03-29 9:52 GMT+08:00 tommy xiao : > >> My Environment is specified: >> >> mesos 1.2 in docker containerized. >> >> send a sample nginx docker container with mesos native health check. >> >> then get sandbox core dump. >> >> i have digg into more information for your reference: >> >> in mesos slave container, i can only see task container pid. but i can't >> found process nginx pid. >> >> but in host console, i can found the nginx pid. so how can i get the pid >> in container? >> >> >> >> >> 2017-03-28 13:49 GMT+08:00 tommy xiao : >> >>> https://issues.apache.org/jira/browse/MESOS-6184 >>> >>> anyone give some hint? >>> >>> ``` >>> >>> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 >>> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent >>> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 >>> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H >>> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 >>> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e >>> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a >>> 274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3b >>> 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox >>> --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest >>> --label=APP_ID=hc --label=VCLUSTER=clusterautotest --label=USER=xychu >>> --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp >>> --name >>> mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb >>> nginx >>> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as >>> health check still in grace period >>> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 >>> times consecutively: HTTP health check failed: curl returned terminated >>> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ >>> include/process/posix/subprocess.hpp:190): Failed to execute >>> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid >>> 18596 does not exist >>> >>>- >>> - >>> - Aborted at 1490672906 (unix time) try "date -d @1490672906" >>> if you are using GNU date *** >>> PC: @ 0x7f26bfb485f7 __GI_raise >>> - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from >>> PID 74; stack trace: *** >>> @ 0x7f26c0703100 (unknown) >>> @ 0x7f26bfb485f7 __GI_raise >>> @ 0x7f26bfb49ce8 __GI_abort >>> @ 0x7f26c315778e _Abort() >>> @ 0x7f26c31577cc _Abort() >>> @ 0x7f26c237a4b6 process::internal::childMain() >>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>> @ 0x7f26c2379e53 process::internal::defaultClone() >>> @ 0x7f26c237b951 process::internal::cloneChild() >>> @ 0x7f26c237954f process::subprocess() >>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>> hCheckerProcess::httpHealthCheck() >>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>> hCheckerProcess::performSingleCheck() >>> @ 0x7f26c2331389 process::ProcessManager::resume() >>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>> impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M >>> _runEv >>> @ 0x7f26c04a1220 (unknown) >>> @ 0x7f26c06fbdc5 start_thread >>> @ 0x7f26bfc0928d __clone >>> W0328 11:48:36.340055 55 health_checker.cpp:202] Health check >>> failed 2 times consecutively: HTTP health check failed: curl >>> returned >>> terminated with signal Aborted (core dumped): ABORT: >>> >>> (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): >>> Failed to execute Subprocess::ChildHook: Failed to enter the net >>> namespace >>> of pid 18596: Pid 18596 does not exist >>> - Aborted at 1490672916 (unix time) try "date -d @1490672916" >>> if you are using GNU date *** >>> PC: @ 0x7f26bfb485f7 __GI_raise >>> - SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from >>> PID 75; stack trace: *** >>> @ 0x7f26c0703100 (unkno
Re: mesos container cluster came across health check coredump log
it resolved by add --pid=host. thanks for community guys supports. thanks a lot. 2017-03-29 9:52 GMT+08:00 tommy xiao : > My Environment is specified: > > mesos 1.2 in docker containerized. > > send a sample nginx docker container with mesos native health check. > > then get sandbox core dump. > > i have digg into more information for your reference: > > in mesos slave container, i can only see task container pid. but i can't > found process nginx pid. > > but in host console, i can found the nginx pid. so how can i get the pid > in container? > > > > > 2017-03-28 13:49 GMT+08:00 tommy xiao : > >> https://issues.apache.org/jira/browse/MESOS-6184 >> >> anyone give some hint? >> >> ``` >> >> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 >> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent >> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 >> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H >> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 >> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e >> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9- >> a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos-2f3 >> b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a- >> 67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu >> --label=GROUP_NAME=groupautotest --label=APP_ID=hc >> --label=VCLUSTER=clusterautotest --label=USER=xychu >> --label=CLUSTER=datamanmesos --label=SLOT=0 --label=APP=hc -p 31000:80/tcp >> --name >> mesos-a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb >> nginx >> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as >> health check still in grace period >> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 >> times consecutively: HTTP health check failed: curl returned terminated >> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ >> include/process/posix/subprocess.hpp:190): Failed to execute >> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid >> 18596 does not exist >> >>- >> - >> - Aborted at 1490672906 (unix time) try "date -d @1490672906" >> if you are using GNU date *** >> PC: @ 0x7f26bfb485f7 __GI_raise >> - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from >> PID 74; stack trace: *** >> @ 0x7f26c0703100 (unknown) >> @ 0x7f26bfb485f7 __GI_raise >> @ 0x7f26bfb49ce8 __GI_abort >> @ 0x7f26c315778e _Abort() >> @ 0x7f26c31577cc _Abort() >> @ 0x7f26c237a4b6 process::internal::childMain() >> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >> @ 0x7f26c2379e53 process::internal::defaultClone() >> @ 0x7f26c237b951 process::internal::cloneChild() >> @ 0x7f26c237954f process::subprocess() >> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >> hCheckerProcess::httpHealthCheck() >> @ 0x7f26c15ababd mesos::internal::checks::Healt >> hCheckerProcess::performSingleCheck() >> @ 0x7f26c2331389 process::ProcessManager::resume() >> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >> impleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M >> _runEv >> @ 0x7f26c04a1220 (unknown) >> @ 0x7f26c06fbdc5 start_thread >> @ 0x7f26bfc0928d __clone >> W0328 11:48:36.340055 55 health_checker.cpp:202] Health check >> failed 2 times consecutively: HTTP health check failed: curl >> returned >> terminated with signal Aborted (core dumped): ABORT: >> >> (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): >> Failed to execute Subprocess::ChildHook: Failed to enter the net >> namespace >> of pid 18596: Pid 18596 does not exist >> - Aborted at 1490672916 (unix time) try "date -d @1490672916" >> if you are using GNU date *** >> PC: @ 0x7f26bfb485f7 __GI_raise >> - SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from >> PID 75; stack trace: *** >> @ 0x7f26c0703100 (unknown) >> @ 0x7f26bfb485f7 __GI_raise >> @ 0x7f26bfb49ce8 __GI_abort >> @ 0x7f26c315778e _Abort() >> @ 0x7f26c31577cc _Abort() >> @ 0x7f26c237a4b6 process::internal::childMain() >> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >> @ 0x7f26c2379e53 process::internal::defaultClone() >> @ 0x7f26c237b951 process::internal::cloneChild() >> @ 0x7f26c237954f process::subprocess() >> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >> hCheckerProcess::httpHealthCheck() >> @ 0x7f26c15ababd mesos::internal::checks::Healt >> hCheckerProcess::performSingleCheck() >> @ 0x7f26c2331389 process::ProcessManager::resume() >> @ 0x7f26c233a3f7 _ZNSt6t
Re: mesos container cluster came across health check coredump log
My Environment is specified: mesos 1.2 in docker containerized. send a sample nginx docker container with mesos native health check. then get sandbox core dump. i have digg into more information for your reference: in mesos slave container, i can only see task container pid. but i can't found process nginx pid. but in host console, i can found the nginx pid. so how can i get the pid in container? 2017-03-28 13:49 GMT+08:00 tommy xiao : > https://issues.apache.org/jira/browse/MESOS-6184 > > anyone give some hint? > > ``` > > I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 > I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent > a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 > I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 > --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5- > 3e3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924- > 42d9-a274-c020afba6bce-/executors/0-hc-xychu-datamanmesos- > 2f3b47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337- > ad3a-67321e96f5cb:/mnt/mesos/sandbox --net bridge --label=USER_NAME=xychu > --label=GROUP_NAME=groupautotest --label=APP_ID=hc > --label=VCLUSTER=clusterautotest > --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 > --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4- > dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx > I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as > health check still in grace period > W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed 1 > times consecutively: HTTP health check failed: curl returned terminated > with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ > include/process/posix/subprocess.hpp:190): Failed to execute > Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: Pid > 18596 does not exist > >- > - > - Aborted at 1490672906 (unix time) try "date -d @1490672906" if > you are using GNU date *** > PC: @ 0x7f26bfb485f7 __GI_raise > - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) from > PID 74; stack trace: *** > @ 0x7f26c0703100 (unknown) > @ 0x7f26bfb485f7 __GI_raise > @ 0x7f26bfb49ce8 __GI_abort > @ 0x7f26c315778e _Abort() > @ 0x7f26c31577cc _Abort() > @ 0x7f26c237a4b6 process::internal::childMain() > @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() > @ 0x7f26c2379e53 process::internal::defaultClone() > @ 0x7f26c237b951 process::internal::cloneChild() > @ 0x7f26c237954f process::subprocess() > @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess:: > httpHealthCheck() > @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess:: > performSingleCheck() > @ 0x7f26c2331389 process::ProcessManager::resume() > @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_ > simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_ > M_runEv > @ 0x7f26c04a1220 (unknown) > @ 0x7f26c06fbdc5 start_thread > @ 0x7f26bfc0928d __clone > W0328 11:48:36.340055 55 health_checker.cpp:202] Health check > failed 2 times consecutively: HTTP health check failed: curl returned > terminated with signal Aborted (core dumped): ABORT: > > (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:190): > Failed to execute Subprocess::ChildHook: Failed to enter the net > namespace > of pid 18596: Pid 18596 does not exist > - Aborted at 1490672916 (unix time) try "date -d @1490672916" if > you are using GNU date *** > PC: @ 0x7f26bfb485f7 __GI_raise > - SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) from > PID 75; stack trace: *** > @ 0x7f26c0703100 (unknown) > @ 0x7f26bfb485f7 __GI_raise > @ 0x7f26bfb49ce8 __GI_abort > @ 0x7f26c315778e _Abort() > @ 0x7f26c31577cc _Abort() > @ 0x7f26c237a4b6 process::internal::childMain() > @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() > @ 0x7f26c2379e53 process::internal::defaultClone() > @ 0x7f26c237b951 process::internal::cloneChild() > @ 0x7f26c237954f process::subprocess() > @ 0x7f26c15a9fb1 mesos::internal::checks::HealthCheckerProcess:: > httpHealthCheck() > @ 0x7f26c15ababd mesos::internal::checks::HealthCheckerProcess:: > performSingleCheck() > @ 0x7f26c2331389 process::ProcessManager::resume() > @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_ > simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_ > M_runEv > @ 0x7f26c04a1220 (unknown) > @ 0x7f26c06fbdc5 start_thread > @ 0x7f26bfc0928d __clone > W0328 1