Cool, looking forward to it! On Fri, Mar 31, 2017 at 4:30 AM, tommy xiao <[email protected]> wrote:
> Alex,Yes, let me have a try. > > 2017-03-31 3:16 GMT+08:00 Alex Rukletsov <[email protected]>: > >> This is https://issues.apache.org/jira/browse/MESOS-7210. Deshi, do you >> want to send the patch? I or Haosdent can shepherd. >> >> A. >> >> On Thu, Mar 30, 2017 at 12:27 PM, tommy xiao <[email protected]> wrote: >> >>> interesting for the specified case. >>> >>> 2017-03-30 7:52 GMT+08:00 Jie Yu <[email protected]>: >>> >>>> + AlexR, haosdent >>>> >>>> For posterity, the root cause of this problem is that when agent is >>>> running inside a docker container and `--docker_mesos_image` flag is >>>> specified, the pid namespace of the executor container (which initiate the >>>> health check) is different than the root pid namespace. Therefore, getting >>>> the network namespace handle using `/proc/<pid>/ns/net` does not work >>>> because the 'pid' here is in the root pid namespace (reported by docker >>>> daemon). >>>> >>>> Alex and haosdent, I think we should fix this issue. As suggested >>>> above, we can launch the executor container with --pid=host if >>>> `--docker_mesos_image` is specified. >>>> >>>> - Jie >>>> >>>> On Wed, Mar 29, 2017 at 3:56 AM, tommy xiao <[email protected]> wrote: >>>> >>>>> it resolved by add --pid=host. thanks for community guys supports. >>>>> thanks a lot. >>>>> >>>>> 2017-03-29 9:52 GMT+08:00 tommy xiao <[email protected]>: >>>>> >>>>>> My Environment is specified: >>>>>> >>>>>> mesos 1.2 in docker containerized. >>>>>> >>>>>> send a sample nginx docker container with mesos native health check. >>>>>> >>>>>> then get sandbox core dump. >>>>>> >>>>>> i have digg into more information for your reference: >>>>>> >>>>>> in mesos slave container, i can only see task container pid. but i >>>>>> can't found process nginx pid. >>>>>> >>>>>> but in host console, i can found the nginx pid. so how can i get the >>>>>> pid in container? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2017-03-28 13:49 GMT+08:00 tommy xiao <[email protected]>: >>>>>> >>>>>>> https://issues.apache.org/jira/browse/MESOS-6184 >>>>>>> >>>>>>> anyone give some hint? >>>>>>> >>>>>>> ``` >>>>>>> >>>>>>> I0328 11:48:12.922181 48 exec.cpp:162] Version: 1.2.0 >>>>>>> I0328 11:48:12.929252 54 exec.cpp:237] Executor registered on agent >>>>>>> a29dc3a5-3e3f-4058-8ab4-dd7de2ae58d1-S4 >>>>>>> I0328 11:48:12.931640 54 docker.cpp:850] Running docker -H >>>>>>> unix:///var/run/docker.sock run --cpu-shares 10 --memory 33554432 >>>>>>> --env-file /tmp/gvqGyb -v /data/mesos/slaves/a29dc3a5-3e >>>>>>> 3f-4058-8ab4-dd7de2ae58d1-S4/frameworks/d7ef5d2b-f924-42d9-a >>>>>>> 274-c020afba6bce-0000/executors/0-hc-xychu-datamanmesos-2f3b >>>>>>> 47f9ffc048539c7b22baa6c32d8f/runs/458189b8-2ff4-4337-ad3a-67321e96f5cb:/mnt/mesos/sandbox >>>>>>> --net bridge --label=USER_NAME=xychu --label=GROUP_NAME=groupautotest >>>>>>> --label=APP_ID=hc --label=VCLUSTER=clusterautotest >>>>>>> --label=USER=xychu --label=CLUSTER=datamanmesos --label=SLOT=0 >>>>>>> --label=APP=hc -p 31000:80/tcp --name mesos-a29dc3a5-3e3f-4058-8ab4- >>>>>>> dd7de2ae58d1-S4.458189b8-2ff4-4337-ad3a-67321e96f5cb nginx >>>>>>> I0328 11:48:16.145714 53 health_checker.cpp:196] Ignoring failure as >>>>>>> health check still in grace period >>>>>>> W0328 11:48:26.289958 49 health_checker.cpp:202] Health check failed >>>>>>> 1 times consecutively: HTTP health check failed: curl returned >>>>>>> terminated >>>>>>> with signal Aborted (core dumped): ABORT: (../../../3rdparty/libprocess/ >>>>>>> include/process/posix/subprocess.hpp:190): Failed to execute >>>>>>> Subprocess::ChildHook: Failed to enter the net namespace of pid 18596: >>>>>>> Pid >>>>>>> 18596 does not exist >>>>>>> >>>>>>> - >>>>>>> - >>>>>>> - Aborted at 1490672906 (unix time) try "date -d >>>>>>> @1490672906" if you are using GNU date *** >>>>>>> PC: @ 0x7f26bfb485f7 __GI_raise >>>>>>> - SIGABRT (@0x4a) received by PID 74 (TID 0x7f26ba152700) >>>>>>> from PID 74; stack trace: *** >>>>>>> @ 0x7f26c0703100 (unknown) >>>>>>> @ 0x7f26bfb485f7 __GI_raise >>>>>>> @ 0x7f26bfb49ce8 __GI_abort >>>>>>> @ 0x7f26c315778e _Abort() >>>>>>> @ 0x7f26c31577cc _Abort() >>>>>>> @ 0x7f26c237a4b6 process::internal::childMain() >>>>>>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>>>>>> @ 0x7f26c2379e53 process::internal::defaultClone() >>>>>>> @ 0x7f26c237b951 process::internal::cloneChild() >>>>>>> @ 0x7f26c237954f process::subprocess() >>>>>>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>>>>>> hCheckerProcess::httpHealthCheck() >>>>>>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>>>>>> hCheckerProcess::performSingleCheck() >>>>>>> @ 0x7f26c2331389 process::ProcessManager::resume() >>>>>>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>>>>>> impleIFZN7process14ProcessMana >>>>>>> ger12init_threadsEvEUt_vEEE6_M_runEv >>>>>>> @ 0x7f26c04a1220 (unknown) >>>>>>> @ 0x7f26c06fbdc5 start_thread >>>>>>> @ 0x7f26bfc0928d __clone >>>>>>> W0328 11:48:36.340055 55 health_checker.cpp:202] Health >>>>>>> check failed 2 times consecutively: HTTP health check failed: >>>>>>> curl returned >>>>>>> terminated with signal Aborted (core dumped): ABORT: >>>>>>> (../../../3rdparty/libprocess/ >>>>>>> include/process/posix/subprocess.hpp:190): Failed to >>>>>>> execute Subprocess::ChildHook: Failed to enter the net >>>>>>> namespace of pid >>>>>>> 18596: Pid 18596 does not exist >>>>>>> - Aborted at 1490672916 (unix time) try "date -d >>>>>>> @1490672916" if you are using GNU date *** >>>>>>> PC: @ 0x7f26bfb485f7 __GI_raise >>>>>>> - SIGABRT (@0x4b) received by PID 75 (TID 0x7f26b9951700) >>>>>>> from PID 75; stack trace: *** >>>>>>> @ 0x7f26c0703100 (unknown) >>>>>>> @ 0x7f26bfb485f7 __GI_raise >>>>>>> @ 0x7f26bfb49ce8 __GI_abort >>>>>>> @ 0x7f26c315778e _Abort() >>>>>>> @ 0x7f26c31577cc _Abort() >>>>>>> @ 0x7f26c237a4b6 process::internal::childMain() >>>>>>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>>>>>> @ 0x7f26c2379e53 process::internal::defaultClone() >>>>>>> @ 0x7f26c237b951 process::internal::cloneChild() >>>>>>> @ 0x7f26c237954f process::subprocess() >>>>>>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>>>>>> hCheckerProcess::httpHealthCheck() >>>>>>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>>>>>> hCheckerProcess::performSingleCheck() >>>>>>> @ 0x7f26c2331389 process::ProcessManager::resume() >>>>>>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>>>>>> impleIFZN7process14ProcessMana >>>>>>> ger12init_threadsEvEUt_vEEE6_M_runEv >>>>>>> @ 0x7f26c04a1220 (unknown) >>>>>>> @ 0x7f26c06fbdc5 start_thread >>>>>>> @ 0x7f26bfc0928d __clone >>>>>>> W0328 11:48:46.386533 49 health_checker.cpp:202] Health >>>>>>> check failed 3 times consecutively: HTTP health check failed: >>>>>>> curl returned >>>>>>> terminated with signal Aborted (core dumped): ABORT: >>>>>>> (../../../3rdparty/libprocess/ >>>>>>> include/process/posix/subprocess.hpp:190): Failed to >>>>>>> execute Subprocess::ChildHook: Failed to enter the net >>>>>>> namespace of pid >>>>>>> 18596: Pid 18596 does not exist >>>>>>> - Aborted at 1490672926 (unix time) try "date -d >>>>>>> @1490672926" if you are using GNU date *** >>>>>>> PC: @ 0x7f26bfb485f7 __GI_raise >>>>>>> - SIGABRT (@0x4c) received by PID 76 (TID 0x7f26ba152700) >>>>>>> from PID 76; stack trace: *** >>>>>>> @ 0x7f26c0703100 (unknown) >>>>>>> @ 0x7f26bfb485f7 __GI_raise >>>>>>> @ 0x7f26bfb49ce8 __GI_abort >>>>>>> @ 0x7f26c315778e _Abort() >>>>>>> @ 0x7f26c31577cc _Abort() >>>>>>> @ 0x7f26c237a4b6 process::internal::childMain() >>>>>>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>>>>>> @ 0x7f26c2379e53 process::internal::defaultClone() >>>>>>> @ 0x7f26c237b951 process::internal::cloneChild() >>>>>>> @ 0x7f26c237954f process::subprocess() >>>>>>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>>>>>> hCheckerProcess::httpHealthCheck() >>>>>>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>>>>>> hCheckerProcess::performSingleCheck() >>>>>>> @ 0x7f26c2331389 process::ProcessManager::resume() >>>>>>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>>>>>> impleIFZN7process14ProcessMana >>>>>>> ger12init_threadsEvEUt_vEEE6_M_runEv >>>>>>> @ 0x7f26c04a1220 (unknown) >>>>>>> @ 0x7f26c06fbdc5 start_thread >>>>>>> @ 0x7f26bfc0928d __clone >>>>>>> W0328 11:48:56.531623 53 health_checker.cpp:202] Health >>>>>>> check failed 4 times consecutively: HTTP health check failed: >>>>>>> curl returned >>>>>>> terminated with signal Aborted (core dumped): ABORT: >>>>>>> (../../../3rdparty/libprocess/ >>>>>>> include/process/posix/subprocess.hpp:190): Failed to >>>>>>> execute Subprocess::ChildHook: Failed to enter the net >>>>>>> namespace of pid >>>>>>> 18596: Pid 18596 does not exist >>>>>>> - Aborted at 1490672936 (unix time) try "date -d >>>>>>> @1490672936" if you are using GNU date *** >>>>>>> PC: @ 0x7f26bfb485f7 __GI_raise >>>>>>> - SIGABRT (@0x4d) received by PID 77 (TID 0x7f26b814e700) >>>>>>> from PID 77; stack trace: *** >>>>>>> @ 0x7f26c0703100 (unknown) >>>>>>> @ 0x7f26bfb485f7 __GI_raise >>>>>>> @ 0x7f26bfb49ce8 __GI_abort >>>>>>> @ 0x7f26c315778e _Abort() >>>>>>> @ 0x7f26c31577cc _Abort() >>>>>>> @ 0x7f26c237a4b6 process::internal::childMain() >>>>>>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>>>>>> @ 0x7f26c2379e53 process::internal::defaultClone() >>>>>>> @ 0x7f26c237b951 process::internal::cloneChild() >>>>>>> @ 0x7f26c237954f process::subprocess() >>>>>>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>>>>>> hCheckerProcess::httpHealthCheck() >>>>>>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>>>>>> hCheckerProcess::performSingleCheck() >>>>>>> @ 0x7f26c2331389 process::ProcessManager::resume() >>>>>>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>>>>>> impleIFZN7process14ProcessMana >>>>>>> ger12init_threadsEvEUt_vEEE6_M_runEv >>>>>>> @ 0x7f26c04a1220 (unknown) >>>>>>> @ 0x7f26c06fbdc5 start_thread >>>>>>> @ 0x7f26bfc0928d __clone >>>>>>> W0328 11:49:06.678515 50 health_checker.cpp:202] Health >>>>>>> check failed 5 times consecutively: HTTP health check failed: >>>>>>> curl returned >>>>>>> terminated with signal Aborted (core dumped): ABORT: >>>>>>> (../../../3rdparty/libprocess/ >>>>>>> include/process/posix/subprocess.hpp:190): Failed to >>>>>>> execute Subprocess::ChildHook: Failed to enter the net >>>>>>> namespace of pid >>>>>>> 18596: Pid 18596 does not exist >>>>>>> - Aborted at 1490672946 (unix time) try "date -d >>>>>>> @1490672946" if you are using GNU date *** >>>>>>> PC: @ 0x7f26bfb485f7 __GI_raise >>>>>>> - SIGABRT (@0x4e) received by PID 78 (TID 0x7f26b9951700) >>>>>>> from PID 78; stack trace: *** >>>>>>> @ 0x7f26c0703100 (unknown) >>>>>>> @ 0x7f26bfb485f7 __GI_raise >>>>>>> @ 0x7f26bfb49ce8 __GI_abort >>>>>>> @ 0x7f26c315778e _Abort() >>>>>>> @ 0x7f26c31577cc _Abort() >>>>>>> @ 0x7f26c237a4b6 process::internal::childMain() >>>>>>> @ 0x7f26c2379e9c std::_Function_handler<>::_M_invoke() >>>>>>> @ 0x7f26c2379e53 process::internal::defaultClone() >>>>>>> @ 0x7f26c237b951 process::internal::cloneChild() >>>>>>> @ 0x7f26c237954f process::subprocess() >>>>>>> @ 0x7f26c15a9fb1 mesos::internal::checks::Healt >>>>>>> hCheckerProcess::httpHealthCheck() >>>>>>> @ 0x7f26c15ababd mesos::internal::checks::Healt >>>>>>> hCheckerProcess::performSingleCheck() >>>>>>> @ 0x7f26c2331389 process::ProcessManager::resume() >>>>>>> @ 0x7f26c233a3f7 _ZNSt6thread5_ImplISt12_Bind_s >>>>>>> impleIFZN7process14ProcessMana >>>>>>> ger12init_threadsEvEUt_vEEE6_M_runEv >>>>>>> @ 0x7f26c04a1220 (unknown) >>>>>>> @ 0x7f26c06fbdc5 start_thread >>>>>>> @ 0x7f26bfc0928d __clone >>>>>>> I0328 11:49:06.678840 50 health_checker.cpp:130] Health >>>>>>> checking stopped >>>>>>> I0328 11:49:06.880620 49 health_checker.cpp:130] Health >>>>>>> checking stopped >>>>>>> >>>>>>> >>>>>>> >>>>>>> ``` >>>>>>> >>>>>>> -- >>>>>>> Deshi Xiao >>>>>>> Twitter: xds2000 >>>>>>> E-mail: xiaods(AT)gmail.com >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Deshi Xiao >>>>>> Twitter: xds2000 >>>>>> E-mail: xiaods(AT)gmail.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Deshi Xiao >>>>> Twitter: xds2000 >>>>> E-mail: xiaods(AT)gmail.com >>>>> >>>> >>>> >>> >>> >>> -- >>> Deshi Xiao >>> Twitter: xds2000 >>> E-mail: xiaods(AT)gmail.com >>> >> >> > > > -- > Deshi Xiao > Twitter: xds2000 > E-mail: xiaods(AT)gmail.com >

