I think it is fixed now.
Since I was upgrading from Deimos to native Docker support, our cluster had
launched **and terminated** 1000's of containers using Jenkins -> Marathon
-> Mesos and Deimos.
Now when using native Mesos 0.20.0 docker support, the slave was trying to
recover all the terminated containers.
Fix: List all containers and user docker rm to remove.
docker ps -a | awk '{print $1}' | xargs docker rm
2014-08-28 12:48 GMT+02:00 Javier Ruiz Jiménez <[email protected]>:
> Thanks Jie
>
> Yes I think that is the reason, but why slave gets stuck? what is done
> during the process of recovering docker containers?
>
> The log line "slave.cpp:3195] Finished recovery" never appears when
> containerizers=docker
>
> If using containerizers=docker goes from Recovering containerizer to
> Current usage
>
> I0828 12:29:19.971142 2210 slave.cpp:307] Slave checkpoint: true
> I0828 12:29:19.975369 2208 state.cpp:33] Recovering state from
> '/tmp/mesos/meta'
> I0828 12:29:19.975666 2208 state.cpp:62] Failed to find the latest slave
> from '/tmp/mesos/meta'
> I0828 12:29:19.979894 2205 status_update_manager.cpp:193] Recovering
> status update manager
> I0828 12:29:19.980206 2209 docker.cpp:649] Recovering Docker containers
> I0828 12:29:19.980765 2209 containerizer.cpp:252] Recovering containerizer
> I0828 12:30:19.972507 2206 slave.cpp:3050] Current usage 44.11%. Max
> allowed age: 3.212379842874491days
>
> Javier
>
>
> 2014-08-27 23:30 GMT+02:00 Jie Yu <[email protected]>:
>
>> I0827 22:58:18.789118 3383 docker.cpp:649] Recovering Docker containers
>>
>> Seems that the slave stuck at recovering docker containers.
>>
>> - Jie
>>
>>
>> On Wed, Aug 27, 2014 at 2:00 PM, Javier Ruiz Jiménez <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> My first post in the user list. Writing from Spain.
>>>
>>> I am setting up a Mesos Cluster and after upgrading to 0.20.0 and re
>>> configuring to use native docker functionality (remove Deimos) I am not
>>> able to containerizers=docker
>>>
>>> Releases:
>>> - Mesos-slave: Version: 0.20.0 Build: 2014-08-22 05:05:59 from
>>> repository http://repos.mesosphere.io/
>>> - Mesos-master: Version: 0.20.0 Build: 2014-08-22 05:05:59 from
>>> repository http://repos.mesosphere.io/
>>> - O.S.: Ubuntu 14.04.1 LTS
>>> - Marathon: Marathon 0.7.0-SNAPSHOT
>>> - Docker version 1.2.0, build fa7b24f
>>>
>>> Installation:
>>> - Single host running mesos-master, mesos-slave, zookeeper and marathon
>>>
>>> If I run mesos-slave with --containerizers=docker,mesos , the mesos UI
>>> doesn't show any slaves or resources. No detection of master.
>>>
>>> If I run mesos-slave with --containerizers=mesos , the mesos UI shows
>>> the slaves and the resources.
>>>
>>> Where can I look for more information?
>>>
>>> I plan to build from repository to see if with the latest code it works.
>>>
>>> I don't see any relevant info in the log:
>>>
>>> Log for --containerizers=docker,mesos:
>>> Command: mesos-slave --master=192.168.0.57:5050
>>> --containerizers=mesos,docker --hostname=192.168.0.57 --IP=192.168.0.57
>>> --log_dir=/var/log/mesos
>>> ________________________________
>>> I0827 22:58:17.774178 3379 logging.cpp:142] INFO level logging started!
>>> I0827 22:58:17.775691 3379 main.cpp:126] Build: 2014-08-22 05:05:59 by
>>> root
>>> I0827 22:58:17.775956 3379 main.cpp:128] Version: 0.20.0
>>> I0827 22:58:17.776199 3379 main.cpp:131] Git tag: 0.20.0
>>> I0827 22:58:17.776407 3379 main.cpp:135] Git SHA:
>>> f421ffdf8d32a8834b3a6ee483b5b59f65956497
>>> I0827 22:58:17.776654 3379 containerizer.cpp:89] Using isolation:
>>> posix/cpu,posix/mem
>>> I0827 22:58:18.783756 3379 main.cpp:149] Starting Mesos slave
>>> I0827 22:58:18.784618 3387 slave.cpp:167] Slave started on 1)@
>>> 192.168.0.57:5051
>>> I0827 22:58:18.784915 3387 slave.cpp:278] Slave resources: cpus(*):2;
>>> mem(*):15025; disk(*):26760; ports(*):[31000-32000]
>>> I0827 22:58:18.785161 3387 slave.cpp:306] Slave hostname: 192.168.0.57
>>> I0827 22:58:18.785348 3387 slave.cpp:307] Slave checkpoint: true
>>> I0827 22:58:18.787921 3381 state.cpp:33] Recovering state from
>>> '/tmp/mesos/meta'
>>> I0827 22:58:18.788781 3381 status_update_manager.cpp:193] Recovering
>>> status update manager
>>> I0827 22:58:18.789072 3381 containerizer.cpp:252] Recovering
>>> containerizer
>>> I0827 22:58:18.789118 3383 docker.cpp:649] Recovering Docker containers
>>>
>>> Log for --containerizers=mesos:
>>> Command: mesos-slave --master=192.168.0.57:5050 --containerizers=mesos
>>> --hostname=192.168.0.57 --IP=192.168.0.57 --log_dir=/var/log/mesos
>>> __________________________
>>> I0827 22:53:09.960764 3369 logging.cpp:142] INFO level logging started!
>>> I0827 22:53:09.962329 3369 main.cpp:126] Build: 2014-08-22 05:05:59 by
>>> root
>>> I0827 22:53:09.962749 3369 main.cpp:128] Version: 0.20.0
>>> I0827 22:53:09.963065 3369 main.cpp:131] Git tag: 0.20.0
>>> I0827 22:53:09.963369 3369 main.cpp:135] Git SHA:
>>> f421ffdf8d32a8834b3a6ee483b5b59f65956497
>>> I0827 22:53:09.963709 3369 containerizer.cpp:89] Using isolation:
>>> posix/cpu,posix/mem
>>> I0827 22:53:09.968011 3369 main.cpp:149] Starting Mesos slave
>>> I0827 22:53:09.968703 3377 slave.cpp:167] Slave started on 1)@
>>> 192.168.0.57:5051
>>> I0827 22:53:09.969319 3377 slave.cpp:278] Slave resources: cpus(*):2;
>>> mem(*):15025; disk(*):26760; ports(*):[31000-32000]
>>> I0827 22:53:09.969703 3377 slave.cpp:306] Slave hostname: 192.168.0.57
>>> I0827 22:53:09.969912 3377 slave.cpp:307] Slave checkpoint: true
>>> I0827 22:53:09.972486 3377 state.cpp:33] Recovering state from
>>> '/tmp/mesos/meta'
>>> I0827 22:53:09.972882 3377 state.cpp:62] Failed to find the latest
>>> slave from '/tmp/mesos/meta'
>>> I0827 22:53:09.973140 3377 status_update_manager.cpp:193] Recovering
>>> status update manager
>>> I0827 22:53:09.973368 3377 containerizer.cpp:252] Recovering
>>> containerizer
>>> I0827 22:53:09.973819 3377 slave.cpp:3195] Finished recovery
>>> I0827 22:53:09.974473 3377 slave.cpp:589] New master detected at
>>> [email protected]:5050
>>> I0827 22:53:09.974822 3370 status_update_manager.cpp:167] New master
>>> detected at [email protected]:5050
>>> I0827 22:53:09.974983 3377 slave.cpp:625] No credentials provided.
>>> Attempting to register without authentication
>>> I0827 22:53:09.975147 3377 slave.cpp:636] Detecting new master
>>> I0827 22:53:10.758889 3374 slave.cpp:754] Registered with master
>>> [email protected]:5050; given slave ID
>>> 20140827-224307-956344512-5050-3182-0
>>> I0827 22:53:10.759145 3374 slave.cpp:767] Checkpointing SlaveInfo to
>>> '/tmp/mesos/meta/slaves/20140827-224307-956344512-5050-3182-0/slave.info
>>> '
>>>
>>> Thanks.
>>> --
>>> Javier
>>>
>>
>>
>
>
> --
> Javier Ruiz Jiménez
> Tecsisa
> E: [email protected]
> Tel: +34 91.182.04.71 (directo)
> Tel: +34 91.445.21.15 (centralita)
> Fax: +34 91.447.05.11
> http://www.tecsisa.com
>
--
Javier Ruiz Jiménez
Tecsisa
E: [email protected]
Tel: +34 91.182.04.71 (directo)
Tel: +34 91.445.21.15 (centralita)
Fax: +34 91.447.05.11
http://www.tecsisa.com