[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load
[ https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312627#comment-17312627 ] michael goulish commented on DISPATCH-2014: --- I just ran Proton's ctest suite with the THREADERCISER turned on – on my box with 32 physical cores, 64 'threads'. Test number 6 – "c-threaderciser" – timed out after 1500 seconds. > Router TCP Adapter crash with high thread count and load > > > Key: DISPATCH-2014 > URL: https://issues.apache.org/jira/browse/DISPATCH-2014 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > Using latest proton and dispatch master code as of 3 hours ago. > Testing router TCP adapter on a machine with 32 cores / 64 threads. > I gave the router 64 worker threads, then used 'hey' load generator to send > it HTTP requests to a TCP listener which router forwarded to Nginx on same > machine. > Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each > sender throttled to 10 messages per second. > It survived many tests, but crashed around test with 200 senders. > I believe this is easily repeatable – I will go check that now. > > Here is the thread that crashed: > {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b} #1 0x7f33186e2848 in lock (m=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b} #2 process (tsk=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color} > {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b} #5 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > > {color:#172b4d}And here are all the threads:{color} > {color:#de350b}(gdb) thread apply all bt{color} > {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} > {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} > {color:#de350b}#5 pn_proactor_done (p=0x10ed970, > batch=batch@entry=0x7f326811a578) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} > {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1140{color} > {color:#de350b}#7 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 process (tsk=) at > /home/mick/latest/qpid-proton/c/src/proacto--Type for more, q to quit, > c to continue without paging--{color} > {color:#de350b}r/epoll.c:2248{color} > {color:#de350b}#4 next_event_batch (p=, can_block=true) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b}#6 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at >
[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load
[ https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306496#comment-17306496 ] michael goulish commented on DISPATCH-2014: --- When I used 64 dispatch worker threads and hit it with 200 'hey' senders – each test 30 seconds long – it died 3 out of 4 times. (SEGV) When I went down to 32 dispatch worker threads, it survived 3 out of 3 tests with 200 senders, and then 3 out of 3 tests with 500 senders, and then 3 out of 3 tests with 1000 senders. > Router TCP Adapter crash with high thread count and load > > > Key: DISPATCH-2014 > URL: https://issues.apache.org/jira/browse/DISPATCH-2014 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > Using latest proton and dispatch master code as of 3 hours ago. > Testing router TCP adapter on a machine with 32 cores / 64 threads. > I gave the router 64 worker threads, then used 'hey' load generator to send > it HTTP requests to a TCP listener which router forwarded to Nginx on same > machine. > Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each > sender throttled to 10 messages per second. > It survived many tests, but crashed around test with 200 senders. > I believe this is easily repeatable – I will go check that now. > > Here is the thread that crashed: > {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b} #1 0x7f33186e2848 in lock (m=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b} #2 process (tsk=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color} > {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b} #5 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > > {color:#172b4d}And here are all the threads:{color} > {color:#de350b}(gdb) thread apply all bt{color} > {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} > {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} > {color:#de350b}#5 pn_proactor_done (p=0x10ed970, > batch=batch@entry=0x7f326811a578) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} > {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1140{color} > {color:#de350b}#7 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 process (tsk=) at > /home/mick/latest/qpid-proton/c/src/proacto--Type for more, q to quit, > c to continue without paging--{color} > {color:#de350b}r/epoll.c:2248{color} > {color:#de350b}#4 next_event_batch (p=, can_block=true) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b}#6 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1