mesos-slave Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98]
Hi all: When mesos slave run task, the stderr file shows I0503 04:01:20.488590 9110 logging.cpp:188] INFO level logging started! I0503 04:01:20.489073 9110 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1\/docker\/links\/b4eabcbb-5769-49f0-9324-b25c3cda8b8c","user":"root"} I0503 04:01:20.491297 9110 fetcher.cpp:379] Fetching URI 'file:///etc/.dockercfg' I0503 04:01:20.491325 9110 fetcher.cpp:250] Fetching directly into the sandbox directory I0503 04:01:20.491348 9110 fetcher.cpp:187] Fetching URI 'file:///etc/.dockercfg' I0503 04:01:20.491367 9110 fetcher.cpp:167] Copying resource with command:cp '/etc/.dockercfg' '/tmp/mesos/slaves/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f0-9324-b25c3cda8b8c/.dockercfg' W0503 04:01:20.495400 9110 fetcher.cpp:272] Copying instead of extracting resource from URI with 'extract' flag, because it does not seem to be an archive: file:///etc/.dockercfg I0503 04:01:20.495728 9110 fetcher.cpp:456] Fetched 'file:///etc/.dockercfg' to '/tmp/mesos/slaves/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f0-9324-b25c3cda8b8c/.dockercfg' F0503 04:01:21.990416 9202 process.cpp:889] Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98] *** Check failure stack trace: *** @ 0x7f95fc6ef86d google::LogMessage::Fail() @ 0x7f95fc6f169d google::LogMessage::SendToLog() @ 0x7f95fc6ef45c google::LogMessage::Flush() @ 0x7f95fc6ef669 google::LogMessage::~LogMessage() @ 0x7f95fc6f05d2 google::ErrnoLogMessage::~ErrnoLogMessage() @ 0x7f95fc6955d9 process::initialize() @ 0x7f95fc696be2 process::ProcessBase::ProcessBase() @ 0x430e9a mesos::internal::docker::DockerExecutorProcess::DockerExecutorProcess() @ 0x41916b main @ 0x7f95fa60ff45 (unknown) @ 0x419c77 (unknown) When mesos slave initialize, it runs into "Failed to bind on 0.0.0.0:0: Address already in use", I run `netstat -nlp`, But there is no port "0" is used, full output is root@10:~# netstat -nlp Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1153/sshd tcp0 0 0.0.0.0:37786 0.0.0.0:* LISTEN 20042/mesos-docker- tcp0 0 0.0.0.0:50510.0.0.0:* LISTEN 12701/mesos-slave tcp0 0 0.0.0.0:37084 0.0.0.0:* LISTEN 19765/mesos-docker- tcp0 0 0.0.0.0:24220 0.0.0.0:* LISTEN 28584/ruby tcp0 0 0.0.0.0:87650.0.0.0:* LISTEN 28353/nginx tcp0 0 0.0.0.0:24224 0.0.0.0:* LISTEN 28584/ruby tcp0 0 127.0.0.1:24225 0.0.0.0:* LISTEN 28584/ruby tcp0 0 0.0.0.0:46690 0.0.0.0:* LISTEN 28932/mesos-docker- tcp0 0 0.0.0.0:42437 0.0.0.0:* LISTEN 32184/mesos-docker- tcp0 0 0.0.0.0:34695 0.0.0.0:* LISTEN 25862/mesos-docker- tcp0 0 0.0.0.0:37039 0.0.0.0:* LISTEN 21273/mesos-docker- tcp0 0 0.0.0.0:46001 0.0.0.0:* LISTEN 710/mesos-docker-ex tcp6 0 0 :::31765:::*LISTEN 20160/docker-proxy tcp6 0 0 :::31605:::*LISTEN 20149/docker-proxy tcp6 0 0 :::31327:::*LISTEN 820/docker-proxy tcp6 0 0 :::31008:::*LISTEN 32291/docker-proxy tcp6 0 0 :::2375 :::*LISTEN 28305/node tcp6 0 0 :::31690:::*LISTEN 25966/docker-proxy tcp6 0 0 :::31211:::*LISTEN 21379/docker-proxy tcp6 0 0 :::31245:::*LISTEN 19988/docker-proxy tcp6 0 0 :::31121:::*LISTEN 29037/docker-proxy udp0 0 0.0.0.0:24224 0.0.0.0:* 28584/ruby udp0 0 192.168.0.1:123 0.0.0.0:* 1348/ntpd udp0 0 59.110.24.56:1230.0.0.0:* 1348/ntpd udp0 0 10.25.141.251:123 0.0.0.0:* 1348/ntpd udp0 0 127.0.0.1:123 0.0.0.0:* 1348/ntpd udp0 0
Re: how to debug when a task is killed
My app does not provide health check mechanism. And I have got the reason: *Not enough memory*. I should provide more memory for my app on Marathon. Thanks all! 2016-12-20 15:17 GMT+08:00 haosdent <haosd...@gmail.com>: > Do you configure health check? If you configure health check and it could > not pass, the task would be killed. > > On Tue, Dec 20, 2016 at 2:23 PM, Luke Adolph <kenan3...@gmail.com> wrote: > >> Hi all: >> >> I have set up a mesos cluster with on mesos master and five mesos agents. >> I use Marathon to depoy an app across mesos agents, which reads process >> info from /proc. >> About every 40 minuntes, my apps will be killed and Marathon restart them. >> The stderr info in sandbox is: >> >> >> I1220 05:05:12.014192 28736 exec.cpp:143] Version: 0.28.1 >> I1220 05:05:12.017397 28740 exec.cpp:217] Executor registered on slave >> 83e33a06-5794-4baa-a654-dd2ecfcd426d-S5 >> 2016/12/20 05:05:12 status read fail. >> 2016/12/20 05:05:12 process id is: 8208 >> 2016/12/20 05:05:12 open /proc/8208/status: no such file or directory >> 2016/12/20 05:06:16 status read fail. >> 2016/12/20 05:06:16 process id is: 8742 >> 2016/12/20 05:06:16 open /proc/8742/status: no such file or directory >> 2016/12/20 05:07:16 status read fail. >> 2016/12/20 05:07:16 process id is: 9005 >> 2016/12/20 05:07:16 open /proc/9005/status: no such file or directory >> 2016/12/20 05:25:50 status read fail. >> 2016/12/20 05:25:50 open /proc/17284/stat: no such file or directory >> Killed >> >> >> >> In addition to above stderr info, I have no meaningful info to provide or >> debug. >> May you share your experience on solving similar situation. >> >> Thanks very much! >> >> -- >> Thanks & Best Regards >> 卢文泉 | Adolph Lu >> TEL:+86 15651006559 <+86%20156%205100%206559> >> Linker Networks(http://www.linkernetworks.com/) >> > > > > -- > Best Regards, > Haosdent Huang > -- Thanks & Best Regards 卢文泉 | Adolph Lu TEL:+86 15651006559 Linker Networks(http://www.linkernetworks.com/)
how to debug when a task is killed
Hi all: I have set up a mesos cluster with on mesos master and five mesos agents. I use Marathon to depoy an app across mesos agents, which reads process info from /proc. About every 40 minuntes, my apps will be killed and Marathon restart them. The stderr info in sandbox is: I1220 05:05:12.014192 28736 exec.cpp:143] Version: 0.28.1 I1220 05:05:12.017397 28740 exec.cpp:217] Executor registered on slave 83e33a06-5794-4baa-a654-dd2ecfcd426d-S5 2016/12/20 05:05:12 status read fail. 2016/12/20 05:05:12 process id is: 8208 2016/12/20 05:05:12 open /proc/8208/status: no such file or directory 2016/12/20 05:06:16 status read fail. 2016/12/20 05:06:16 process id is: 8742 2016/12/20 05:06:16 open /proc/8742/status: no such file or directory 2016/12/20 05:07:16 status read fail. 2016/12/20 05:07:16 process id is: 9005 2016/12/20 05:07:16 open /proc/9005/status: no such file or directory 2016/12/20 05:25:50 status read fail. 2016/12/20 05:25:50 open /proc/17284/stat: no such file or directory Killed In addition to above stderr info, I have no meaningful info to provide or debug. May you share your experience on solving similar situation. Thanks very much! -- Thanks & Best Regards 卢文泉 | Adolph Lu TEL:+86 15651006559 Linker Networks(http://www.linkernetworks.com/)
kill mesos task
Hi all: I find a *zombie task* on mesos. I ssh to the agent and use "sudo kill pid" to kill the task, however, mesos will relaunch it on another angent. So I want to know how to kill the task fully. Thanks! -- Thanks & Best Regards 卢文泉 | Adolph Lu TEL:+86 15651006559 Linker Networks(http://www.linkernetworks.com/)