Hi Siva,

it looks like you bumped into
https://issues.apache.org/jira/browse/MESOS-2276. Feel free to upvote!

On Thu, Feb 5, 2015 at 1:56 PM, Sivaram Kannan <sivara...@gmail.com> wrote:

>
> Hi,
>
> I am our deployments of mesos-slave, we are getting the following error
> during start up. I understand the slave is failing due to large number of
> fd's being opened. I have increased the ulimit of fd's to 4096 from 1024
> but still the same behavior. What can I do to solve this problem, and what
> should I do to prevent it.
>
> Thanks,
> ./Siva.
>
>
> Initiating client connection, host=11.0.190.1:2181 sessionTimeout=10000
> watcher=0x7f6de4
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076289    15
> slave.cpp:169] Slave started on 1)@11.1.6.1:5051
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076544    15
> slave.cpp:289] Slave resources: cpus(*):24; mem(*):47336; disk(*):469416;
> ports(*):[31000-32000]
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076575    15
> slave.cpp:318] Slave hostname: 11.1.6.1
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.076582    15
> slave.cpp:319] Slave checkpoint: true
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078135    25
> state.cpp:33] Recovering state from '/var/lib/mesos/slave/meta'
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078233    20
> status_update_manager.cpp:197] Recovering status update manager
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.078333    20
> docker.cpp:767] Recovering Docker containers
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05
> 12:33:58,102:6(0x7f6dc3fff700):ZOO_INFO@check_events@1703: initiated
> connection to server [11.0.190.1:2181]
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: 2015-02-05
> 12:33:58,104:6(0x7f6dc3fff700):ZOO_INFO@check_events@1750: session
> establishment complete on server [11.0.190.1:2181],
> sessionId=0x14b3c82555299c7,
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104671    30
> group.cpp:313] Group process (group(1)@11.1.6.1:5051) connected to
> ZooKeeper
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104708    30
> group.cpp:790] Syncing group operations: queue size (joins, cancels, datas)
> = (0, 0, 0)
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.104725    30
> group.cpp:385] Trying to create path '/mesos' in ZooKeeper
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106376    22
> detector.cpp:138] Detected a new leader: (id='3')
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.106477    25
> group.cpp:659] Trying to get '/mesos/info_0000000003' in ZooKeeper
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: I0205 12:33:58.107293    30
> detector.cpp:433] A new leading master (UPID=master@11.1.4.1:5050) is
> detected
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Failed to perform recovery:
> Collect failed: Failed to create pipe: Too many open files
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: To remedy this do as follows:
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 1: rm -f
> /var/lib/mesos/slave/meta/slaves/latest
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: This ensures slave doesn't
> recover old live executors.
> Feb 05 12:33:58 node-d4856455ad5c sh[32162]: Step 2: Restart the slave.
> Feb 05 12:33:58 node-d4856455ad5c systemd[1]: mesos-slave.service: main
> process exited, code=exited, status=1/FAILURE
> Feb 05 12:33:58 node-d4856455ad5c docker[3351]: mesos_slave
> Feb 05 12:33:58 node-d4856455ad5c systemd[1]: Unit mesos-slave.service
> entered failed state.
>
>
>

Reply via email to