[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-31 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312627#comment-17312627
 ] 

michael goulish commented on DISPATCH-2014:
---

I just ran Proton's ctest suite with the THREADERCISER turned on – on my box 
with 32 physical cores, 64 'threads'.

Test number 6 – "c-threaderciser" – timed out after 1500 seconds.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> 

[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-22 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306496#comment-17306496
 ] 

michael goulish commented on DISPATCH-2014:
---

When I used 64 dispatch worker threads and hit it with 200 'hey' senders – each 
test 30 seconds long – it died 3 out of 4 times.  (SEGV)

 

When I went down to 32 dispatch worker threads, it survived 3 out of 3 tests 
with 200 senders, and then 3 out of 3 tests with 500 senders, and then 3 out of 
3 tests with 1000 senders.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1