Re: [zeromq-dev] need advice! hitting assertion in epoll.cpp:131 (zmq 4.2.1) when going from RHEL6 to RHEL7?!

2017-02-16 Thread zmqdev

On 16.02.2017 16:26, Luca Boccassi wrote:

What's the file limit on the 2 systems? (With the user that runs the
program)

ulimit -n



on both 6.8 and 7.3:

development environment: ulimit -n = 1024
  installed environment: ulimit -n = 4096

With a basic sampling of file descriptors

while true; do
	lsof -P -M -l -n -d '^cwd,^err,^ltx,^mem,^mmap,^pd,^rtd,^txt' -p -u 
$USER -a | wc -l

sleep 2
done

the total number of file descriptors for $USER increases only by about 600.


___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] need advice! hitting assertion in epoll.cpp:131 (zmq 4.2.1) when going from RHEL6 to RHEL7?!

2017-02-16 Thread Luca Boccassi
On Thu, 2017-02-16 at 15:54 +0100, zmqdev wrote:
> >
> > Are you building your own binaries in both cases?
> >
> 
> yes
> 
> > What polling mechanism was RHEL 6 using? You can see it in
> > the ./configure output: "Using 'epoll' polling system"
> 
> from config.log:
> 
>   Using 'epoll' polling system with CLOEXEC

What's the file limit on the 2 systems? (With the user that runs the
program)

ulimit -n

Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] need advice! hitting assertion in epoll.cpp:131 (zmq 4.2.1) when going from RHEL6 to RHEL7?!

2017-02-16 Thread zmqdev


Are you building your own binaries in both cases?



yes


What polling mechanism was RHEL 6 using? You can see it in
the ./configure output: "Using 'epoll' polling system"


from config.log:

Using 'epoll' polling system with CLOEXEC



___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Re: [zeromq-dev] need advice! hitting assertion in epoll.cpp:131 (zmq 4.2.1) when going from RHEL6 to RHEL7?!

2017-02-16 Thread Luca Boccassi
On Thu, 2017-02-16 at 14:59 +0100, zmqdev wrote:
> Hello,
> 
> I could use some advice to diagnose the following issue.
> 
> I have a program that has been running without problems for a couple of 
> years on Red Hat Enterprise Linux 6 at various sites.
> 
> On RHEL7, the program triggers the assertion
> 
>   Bad file descriptor (src/epoll.cpp:131)
> 
> in about 1/3 of executions, during startup (sometimes during shutdown).
> 
> Less often, I see
> 
>   Bad file descriptor (src/epoll.cpp:100)
> 
> The problem persists after upgrading to ZeroMQ 4.2.1 from 4.1.6.
> 
> I don't get it!
> 
> Programming errors aside, I do check all return codes and log errors as 
> they occur in the main thread, and there is nothing until libzmq commits 
> suicide from one of its threads.
> 
> Any idea/advice on how I could track down this problem?
> 
> What makes RHEL7 different enough from RHEL6 to emerge this kind of errors?
> 
> Cheers :-(
> 
> 
> GDB BACKTRACE FROM CORE FILE:
> 
> Thread 3 (Thread 0xf736b900 (LWP 5039)):
> #0  0xf7751430 in __kernel_vsyscall ()
> #1  0xf745694b in poll () from /lib/libc.so.6
> #2  0xf6ff5457 in 
> zmq::socket_poller_t::wait(zmq::socket_poller_t::event_t*, int, long) () 
> from $TOP/lib/platform/libzmq.so.5
> #3  0xf6ff325f in zmq_poller_wait_all(void*, zmq_poller_event_t*, int, 
> long) () from $TOP/lib/platform/libzmq.so.5
> #4  0xf6ff3aa5 in zmq_poller_poll(zmq_pollitem_t*, int, long) () from 
> $TOP/lib/platform/libzmq.so.5
> #5  0xf6ff2bb1 in zmq_poll () from $TOP/lib/platform/libzmq.so.5
> #6  0xf702cec1 in zt_reactor_loop (r=) at 
> $TOP/src/reactor.c:268
> (...)
> #17 0x080487da in main ()
> 
> Thread 2 (Thread 0xf6e6db40 (LWP 5066)):
> #0  0xf7751430 in __kernel_vsyscall ()
> #1  0xf7463a16 in epoll_wait () from /lib/libc.so.6
> #2  0xf6fa17d0 in zmq::epoll_t::loop() () from $TOP/lib/platform/libzmq.so.5
> #3  0xf6fa1a35 in zmq::epoll_t::worker_routine(void*) () from 
> $TOP/lib/platform/libzmq.so.5
> #4  0xf6fe36f2 in thread_routine () from $TOP/lib/platform/libzmq.so.5
> #5  0xf7574b2c in start_thread () from /lib/libpthread.so.0
> #6  0xf746308e in clone () from /lib/libc.so.6
> 
> Thread 1 (Thread 0xf666cb40 (LWP 5067)):
> #0  0xf7751430 in __kernel_vsyscall ()
> #1  0xf739a1f7 in raise () from /lib/libc.so.6
> #2  0xf739ba33 in abort () from /lib/libc.so.6
> #3  0xf6fa2726 in zmq::zmq_abort(char const*) () from 
> $TOP/lib/platform/libzmq.so.5
> #4  0xf6fa164b in zmq::epoll_t::set_pollout(void*) () from 
> $TOP/lib/platform/libzmq.so.5
> #5  0xf6fa3951 in zmq::io_object_t::set_pollout(void*) () from 
> $TOP/lib/platform/libzmq.so.5
> #6  0xf6fdafe1 in zmq::stream_engine_t::restart_output() () from 
> $TOP/lib/platform/libzmq.so.5
> #7  0xf6fcae20 in zmq::session_base_t::read_activated(zmq::pipe_t*) () 
> from $TOP/lib/platform/libzmq.so.5
> #8  0xf6fb9dd3 in zmq::pipe_t::process_activate_read() () from 
> $TOP/lib/platform/libzmq.so.5
> #9  0xf6fb2a9e in zmq::object_t::process_command(zmq::command_t&) () 
> from $TOP/lib/platform/libzmq.so.5
> #10 0xf6fa3f77 in zmq::io_thread_t::in_event() () from 
> $TOP/lib/platform/libzmq.so.5
> #11 0xf6fa1948 in zmq::epoll_t::loop() () from $TOP/lib/platform/libzmq.so.5
> #12 0xf6fa1a35 in zmq::epoll_t::worker_routine(void*) () from 
> $TOP/lib/platform/libzmq.so.5
> #13 0xf6fe36f2 in thread_routine () from $TOP/lib/platform/libzmq.so.5
> #14 0xf7574b2c in start_thread () from /lib/libpthread.so.0
> #15 0xf746308e in clone () from /lib/libc.so.6

Are you building your own binaries in both cases?

What polling mechanism was RHEL 6 using? You can see it in
the ./configure output: "Using 'epoll' polling system"

Kind regards,
Luca Boccassi


signature.asc
Description: This is a digitally signed message part
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

[zeromq-dev] need advice! hitting assertion in epoll.cpp:131 (zmq 4.2.1) when going from RHEL6 to RHEL7?!

2017-02-16 Thread zmqdev

Hello,

I could use some advice to diagnose the following issue.

I have a program that has been running without problems for a couple of 
years on Red Hat Enterprise Linux 6 at various sites.


On RHEL7, the program triggers the assertion

Bad file descriptor (src/epoll.cpp:131)

in about 1/3 of executions, during startup (sometimes during shutdown).

Less often, I see

Bad file descriptor (src/epoll.cpp:100)

The problem persists after upgrading to ZeroMQ 4.2.1 from 4.1.6.

I don't get it!

Programming errors aside, I do check all return codes and log errors as 
they occur in the main thread, and there is nothing until libzmq commits 
suicide from one of its threads.


Any idea/advice on how I could track down this problem?

What makes RHEL7 different enough from RHEL6 to emerge this kind of errors?

Cheers :-(


GDB BACKTRACE FROM CORE FILE:

Thread 3 (Thread 0xf736b900 (LWP 5039)):
#0  0xf7751430 in __kernel_vsyscall ()
#1  0xf745694b in poll () from /lib/libc.so.6
#2  0xf6ff5457 in 
zmq::socket_poller_t::wait(zmq::socket_poller_t::event_t*, int, long) () 
from $TOP/lib/platform/libzmq.so.5
#3  0xf6ff325f in zmq_poller_wait_all(void*, zmq_poller_event_t*, int, 
long) () from $TOP/lib/platform/libzmq.so.5
#4  0xf6ff3aa5 in zmq_poller_poll(zmq_pollitem_t*, int, long) () from 
$TOP/lib/platform/libzmq.so.5

#5  0xf6ff2bb1 in zmq_poll () from $TOP/lib/platform/libzmq.so.5
#6  0xf702cec1 in zt_reactor_loop (r=) at 
$TOP/src/reactor.c:268

(...)
#17 0x080487da in main ()

Thread 2 (Thread 0xf6e6db40 (LWP 5066)):
#0  0xf7751430 in __kernel_vsyscall ()
#1  0xf7463a16 in epoll_wait () from /lib/libc.so.6
#2  0xf6fa17d0 in zmq::epoll_t::loop() () from $TOP/lib/platform/libzmq.so.5
#3  0xf6fa1a35 in zmq::epoll_t::worker_routine(void*) () from 
$TOP/lib/platform/libzmq.so.5

#4  0xf6fe36f2 in thread_routine () from $TOP/lib/platform/libzmq.so.5
#5  0xf7574b2c in start_thread () from /lib/libpthread.so.0
#6  0xf746308e in clone () from /lib/libc.so.6

Thread 1 (Thread 0xf666cb40 (LWP 5067)):
#0  0xf7751430 in __kernel_vsyscall ()
#1  0xf739a1f7 in raise () from /lib/libc.so.6
#2  0xf739ba33 in abort () from /lib/libc.so.6
#3  0xf6fa2726 in zmq::zmq_abort(char const*) () from 
$TOP/lib/platform/libzmq.so.5
#4  0xf6fa164b in zmq::epoll_t::set_pollout(void*) () from 
$TOP/lib/platform/libzmq.so.5
#5  0xf6fa3951 in zmq::io_object_t::set_pollout(void*) () from 
$TOP/lib/platform/libzmq.so.5
#6  0xf6fdafe1 in zmq::stream_engine_t::restart_output() () from 
$TOP/lib/platform/libzmq.so.5
#7  0xf6fcae20 in zmq::session_base_t::read_activated(zmq::pipe_t*) () 
from $TOP/lib/platform/libzmq.so.5
#8  0xf6fb9dd3 in zmq::pipe_t::process_activate_read() () from 
$TOP/lib/platform/libzmq.so.5
#9  0xf6fb2a9e in zmq::object_t::process_command(zmq::command_t&) () 
from $TOP/lib/platform/libzmq.so.5
#10 0xf6fa3f77 in zmq::io_thread_t::in_event() () from 
$TOP/lib/platform/libzmq.so.5

#11 0xf6fa1948 in zmq::epoll_t::loop() () from $TOP/lib/platform/libzmq.so.5
#12 0xf6fa1a35 in zmq::epoll_t::worker_routine(void*) () from 
$TOP/lib/platform/libzmq.so.5

#13 0xf6fe36f2 in thread_routine () from $TOP/lib/platform/libzmq.so.5
#14 0xf7574b2c in start_thread () from /lib/libpthread.so.0
#15 0xf746308e in clone () from /lib/libc.so.6


___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
https://lists.zeromq.org/mailman/listinfo/zeromq-dev