Hi, I am our deployments of mesos-slave, we are getting the following error during start up. I understand the slave is failing due to large number of fd's being opened. I have increased the ulimit of fd's to 4096 from 1024 but still the same behavior. What can I do to solve this problem, and what should I do to prevent it.
Thanks, ./Siva. Initiating client connection, host=11.0.190.1:2181 sessionTimeout=10000 watcher=0x7f6de4 Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076289 15 slave.cpp:169] Slave started on 1)@11.1.6.1:5051 Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076544 15 slave.cpp:289] Slave resources: cpus(*):24; mem(*):47336; disk(*):469416; ports(*):[31000-32000] Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076575 15 slave.cpp:318] Slave hostname: 11.1.6.1 Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076582 15 slave.cpp:319] Slave checkpoint: true Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078135 25 state.cpp:33] Recovering state from '/var/lib/mesos/slave/meta' Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078233 20 status_update_manager.cpp:197] Recovering status update manager Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078333 20 docker.cpp:767] Recovering Docker containers Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05 12:33:58,102:6(0x7f6dc3fff700):ZOO_INFO@check_events@1703: initiated connection to server [11.0.190.1:2181] Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05 12:33:58,104:6(0x7f6dc3fff700):ZOO_INFO@check_events@1750: session establishment complete on server [11.0.190.1:2181], sessionId=0x14b3c82555299c7, Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104671 30 group.cpp:313] Group process (group(1)@11.1.6.1:5051) connected to ZooKeeper Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104708 30 group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104725 30 group.cpp:385] Trying to create path '/mesos' in ZooKeeper Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106376 22 detector.cpp:138] Detected a new leader: (id='3') Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106477 25 group.cpp:659] Trying to get '/mesos/info_0000000003' in ZooKeeper Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.107293 30 detector.cpp:433] A new leading master ([email protected]:5050) is detected Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Failed to perform recovery: Collect failed: Failed to create pipe: Too many open files Feb 05 12:33:58 node-d4856455ad5c sh[32162]: To remedy this do as follows: Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest Feb 05 12:33:58 node-d4856455ad5c sh[32162]: This ensures slave doesn't recover old live executors. Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 2: Restart the slave. Feb 05 12:33:58 node-d4856455ad5c systemd[1]: mesos-slave.service: main process exited, code=exited, status=1/FAILURE Feb 05 12:33:58 node-d4856455ad5c docker[3351]: mesos_slave Feb 05 12:33:58 node-d4856455ad5c systemd[1]: Unit mesos-slave.service entered failed state.

