Mesos-slave start error

Sivaram Kannan Thu, 05 Feb 2015 04:57:05 -0800

Hi,

I am our deployments of mesos-slave, we are getting the following error
during start up. I understand the slave is failing due to large number of
fd's being opened. I have increased the ulimit of fd's to 4096 from 1024
but still the same behavior. What can I do to solve this problem, and what
should I do to prevent it.


Thanks,
./Siva.


Initiating client connection, host=11.0.190.1:2181 sessionTimeout=10000
watcher=0x7f6de4
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076289    15
slave.cpp:169] Slave started on 1)@11.1.6.1:5051
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076544    15
slave.cpp:289] Slave resources: cpus(*):24; mem(*):47336; disk(*):469416;
ports(*):[31000-32000]
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076575    15
slave.cpp:318] Slave hostname: 11.1.6.1
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076582    15
slave.cpp:319] Slave checkpoint: true
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078135    25
state.cpp:33] Recovering state from '/var/lib/mesos/slave/meta'
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078233    20
status_update_manager.cpp:197] Recovering status update manager
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078333    20
docker.cpp:767] Recovering Docker containers
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05
12:33:58,102:6(0x7f6dc3fff700):ZOO_INFO@check_events@1703: initiated
connection to server [11.0.190.1:2181]
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05
12:33:58,104:6(0x7f6dc3fff700):ZOO_INFO@check_events@1750: session
establishment complete on server [11.0.190.1:2181],
sessionId=0x14b3c82555299c7,
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104671    30
group.cpp:313] Group process (group(1)@11.1.6.1:5051) connected to ZooKeeper
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104708    30
group.cpp:790] Syncing group operations: queue size (joins, cancels, datas)
= (0, 0, 0)
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104725    30
group.cpp:385] Trying to create path '/mesos' in ZooKeeper
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106376    22
detector.cpp:138] Detected a new leader: (id='3')
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106477    25
group.cpp:659] Trying to get '/mesos/info_0000000003' in ZooKeeper
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.107293    30
detector.cpp:433] A new leading master ([email protected]:5050) is
detected
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Failed to perform recovery:
Collect failed: Failed to create pipe: Too many open files
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: To remedy this do as follows:
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 1: rm -f
/var/lib/mesos/slave/meta/slaves/latest
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: This ensures slave doesn't
recover old live executors.
Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 2: Restart the slave.
Feb 05 12:33:58 node-d4856455ad5c systemd[1]: mesos-slave.service: main
process exited, code=exited, status=1/FAILURE
Feb 05 12:33:58 node-d4856455ad5c docker[3351]: mesos_slave
Feb 05 12:33:58 node-d4856455ad5c systemd[1]: Unit mesos-slave.service
entered failed state.

Mesos-slave start error

Reply via email to