mesos-slave Failed to initialize: Failed to bind on 0.0.0.0:0: Address already in use: Address already in use [98]

2018-05-02 Thread Luke Adolph
Hi all:
When mesos slave run task, the stderr file shows
I0503 04:01:20.488590  9110 logging.cpp:188] INFO level logging started!
I0503 04:01:20.489073  9110 fetcher.cpp:424] Fetcher Info:
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1\/root","items":[{"action":"BYPASS_CACHE","uri":{"extract":true,"value":"file:\/\/\/etc\/.dockercfg"}}],"sandbox_directory":"\/tmp\/mesos\/slaves\/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1\/docker\/links\/b4eabcbb-5769-49f0-9324-b25c3cda8b8c","user":"root"}
I0503 04:01:20.491297  9110 fetcher.cpp:379] Fetching URI
'file:///etc/.dockercfg'
I0503 04:01:20.491325  9110 fetcher.cpp:250] Fetching directly into the
sandbox directory
I0503 04:01:20.491348  9110 fetcher.cpp:187] Fetching URI
'file:///etc/.dockercfg'
I0503 04:01:20.491367  9110 fetcher.cpp:167] Copying resource with
command:cp '/etc/.dockercfg'
'/tmp/mesos/slaves/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f0-9324-b25c3cda8b8c/.dockercfg'
W0503 04:01:20.495400  9110 fetcher.cpp:272] Copying instead of extracting
resource from URI with 'extract' flag, because it does not seem to be an
archive: file:///etc/.dockercfg
I0503 04:01:20.495728  9110 fetcher.cpp:456] Fetched
'file:///etc/.dockercfg' to
'/tmp/mesos/slaves/2bcc032f-950b-4c36-bff4-b5552c193dc9-S1/docker/links/b4eabcbb-5769-49f0-9324-b25c3cda8b8c/.dockercfg'
F0503 04:01:21.990416  9202 process.cpp:889] Failed to initialize: Failed
to bind on 0.0.0.0:0: Address already in use: Address already in use [98]
*** Check failure stack trace: ***
@ 0x7f95fc6ef86d  google::LogMessage::Fail()
@ 0x7f95fc6f169d  google::LogMessage::SendToLog()
@ 0x7f95fc6ef45c  google::LogMessage::Flush()
@ 0x7f95fc6ef669  google::LogMessage::~LogMessage()
@ 0x7f95fc6f05d2  google::ErrnoLogMessage::~ErrnoLogMessage()
@ 0x7f95fc6955d9  process::initialize()
@ 0x7f95fc696be2  process::ProcessBase::ProcessBase()
@   0x430e9a
mesos::internal::docker::DockerExecutorProcess::DockerExecutorProcess()
@   0x41916b  main
@ 0x7f95fa60ff45  (unknown)
@   0x419c77  (unknown)

When mesos slave initialize, it runs into "Failed to bind on 0.0.0.0:0:
Address already in use", I run `netstat -nlp`, But there is no port "0" is
used, full output is
root@10:~# netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State
 PID/Program name
tcp0  0 0.0.0.0:22  0.0.0.0:*   LISTEN
1153/sshd
tcp0  0 0.0.0.0:37786   0.0.0.0:*   LISTEN
20042/mesos-docker-
tcp0  0 0.0.0.0:50510.0.0.0:*   LISTEN
12701/mesos-slave
tcp0  0 0.0.0.0:37084   0.0.0.0:*   LISTEN
19765/mesos-docker-
tcp0  0 0.0.0.0:24220   0.0.0.0:*   LISTEN
28584/ruby
tcp0  0 0.0.0.0:87650.0.0.0:*   LISTEN
28353/nginx
tcp0  0 0.0.0.0:24224   0.0.0.0:*   LISTEN
28584/ruby
tcp0  0 127.0.0.1:24225 0.0.0.0:*   LISTEN
28584/ruby
tcp0  0 0.0.0.0:46690   0.0.0.0:*   LISTEN
28932/mesos-docker-
tcp0  0 0.0.0.0:42437   0.0.0.0:*   LISTEN
32184/mesos-docker-
tcp0  0 0.0.0.0:34695   0.0.0.0:*   LISTEN
25862/mesos-docker-
tcp0  0 0.0.0.0:37039   0.0.0.0:*   LISTEN
21273/mesos-docker-
tcp0  0 0.0.0.0:46001   0.0.0.0:*   LISTEN
710/mesos-docker-ex
tcp6   0  0 :::31765:::*LISTEN
20160/docker-proxy
tcp6   0  0 :::31605:::*LISTEN
20149/docker-proxy
tcp6   0  0 :::31327:::*LISTEN
820/docker-proxy
tcp6   0  0 :::31008:::*LISTEN
32291/docker-proxy
tcp6   0  0 :::2375 :::*LISTEN
28305/node
tcp6   0  0 :::31690:::*LISTEN
25966/docker-proxy
tcp6   0  0 :::31211:::*LISTEN
21379/docker-proxy
tcp6   0  0 :::31245:::*LISTEN
19988/docker-proxy
tcp6   0  0 :::31121:::*LISTEN
29037/docker-proxy
udp0  0 0.0.0.0:24224   0.0.0.0:*
 28584/ruby
udp0  0 192.168.0.1:123 0.0.0.0:*
 1348/ntpd
udp0  0 59.110.24.56:1230.0.0.0:*
 1348/ntpd
udp0  0 10.25.141.251:123   0.0.0.0:*
 1348/ntpd
udp0  0 127.0.0.1:123   0.0.0.0:*
 1348/ntpd
udp0  0 

Re: how to debug when a task is killed

2016-12-20 Thread Luke Adolph
My app does not provide health check mechanism.
And I have got the reason: *Not enough memory*.
I should provide more memory for my app on Marathon.
Thanks all!
​

2016-12-20 15:17 GMT+08:00 haosdent <haosd...@gmail.com>:

> Do you configure health check? If you configure health check and it could
> not pass, the task would be killed.
>
> On Tue, Dec 20, 2016 at 2:23 PM, Luke Adolph <kenan3...@gmail.com> wrote:
>
>> Hi all:
>>
>> I have set up a mesos cluster with on mesos master and five mesos agents.
>> I use Marathon to depoy an app across mesos agents, which reads process
>> info from /proc.
>> About every 40 minuntes, my apps will be killed and Marathon restart them.
>> The stderr info in sandbox is:
>> ​
>>
>> I1220 05:05:12.014192 28736 exec.cpp:143] Version: 0.28.1
>> I1220 05:05:12.017397 28740 exec.cpp:217] Executor registered on slave 
>> 83e33a06-5794-4baa-a654-dd2ecfcd426d-S5
>> 2016/12/20 05:05:12 status read fail.
>> 2016/12/20 05:05:12 process id is: 8208
>> 2016/12/20 05:05:12 open /proc/8208/status: no such file or directory
>> 2016/12/20 05:06:16 status read fail.
>> 2016/12/20 05:06:16 process id is: 8742
>> 2016/12/20 05:06:16 open /proc/8742/status: no such file or directory
>> 2016/12/20 05:07:16 status read fail.
>> 2016/12/20 05:07:16 process id is: 9005
>> 2016/12/20 05:07:16 open /proc/9005/status: no such file or directory
>> 2016/12/20 05:25:50 status read fail.
>> 2016/12/20 05:25:50 open /proc/17284/stat: no such file or directory
>> Killed
>>
>> ​
>>
>> In addition to above stderr info, I have no meaningful info to provide or
>> debug.
>> May you share your experience on solving similar situation.
>>
>> Thanks very much!
>>
>> --
>> Thanks & Best Regards
>> 卢文泉 | Adolph Lu
>> TEL:+86 15651006559 <+86%20156%205100%206559>
>> Linker Networks(http://www.linkernetworks.com/)
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Thanks & Best Regards
卢文泉 | Adolph Lu
TEL:+86 15651006559
Linker Networks(http://www.linkernetworks.com/)


how to debug when a task is killed

2016-12-19 Thread Luke Adolph
Hi all:

I have set up a mesos cluster with on mesos master and five mesos agents.
I use Marathon to depoy an app across mesos agents, which reads process
info from /proc.
About every 40 minuntes, my apps will be killed and Marathon restart them.
The stderr info in sandbox is:
​

I1220 05:05:12.014192 28736 exec.cpp:143] Version: 0.28.1
I1220 05:05:12.017397 28740 exec.cpp:217] Executor registered on slave
83e33a06-5794-4baa-a654-dd2ecfcd426d-S5
2016/12/20 05:05:12 status read fail.
2016/12/20 05:05:12 process id is: 8208
2016/12/20 05:05:12 open /proc/8208/status: no such file or directory
2016/12/20 05:06:16 status read fail.
2016/12/20 05:06:16 process id is: 8742
2016/12/20 05:06:16 open /proc/8742/status: no such file or directory
2016/12/20 05:07:16 status read fail.
2016/12/20 05:07:16 process id is: 9005
2016/12/20 05:07:16 open /proc/9005/status: no such file or directory
2016/12/20 05:25:50 status read fail.
2016/12/20 05:25:50 open /proc/17284/stat: no such file or directory
Killed

​

In addition to above stderr info, I have no meaningful info to provide or
debug.
May you share your experience on solving similar situation.

Thanks very much!

-- 
Thanks & Best Regards
卢文泉 | Adolph Lu
TEL:+86 15651006559
Linker Networks(http://www.linkernetworks.com/)


kill mesos task

2016-08-15 Thread Luke Adolph
Hi all:
I find a *zombie task* on mesos. I ssh to the agent and use "sudo kill
pid" to kill the task, however, mesos will relaunch it on another angent.
So I want to know how to kill the task fully.
Thanks!
-- 
Thanks & Best Regards
卢文泉 | Adolph Lu
TEL:+86 15651006559
Linker Networks(http://www.linkernetworks.com/)