[jira] [Created] (DISPATCH-2255) Investigate enable_mask for removal of malloc
michael goulish created DISPATCH-2255: - Summary: Investigate enable_mask for removal of malloc Key: DISPATCH-2255 URL: https://issues.apache.org/jira/browse/DISPATCH-2255 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish Assignee: michael goulish Find out how often enable_mask() is called in log.c See if it would be practical to remove the malloc() and free() in it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-1956) log.c rewrite to reduce locking scope
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated DISPATCH-1956: -- Summary: log.c rewrite to reduce locking scope (was: Potential deadlock: logging lock vs entity cache lock) > log.c rewrite to reduce locking scope > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: michael goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.18.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Assigned] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-1956: - Assignee: michael goulish (was: Michael Goulish) > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: michael goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.18.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (DISPATCH-2173) 30-Mesh Behaving Badly
[ https://issues.apache.org/jira/browse/DISPATCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed DISPATCH-2173. - Resolution: Won't Fix It has been pointed out to me that a 30-mesh is not very realistic. I was forced to admit that this was probably true. > 30-Mesh Behaving Badly > -- > > Key: DISPATCH-2173 > URL: https://issues.apache.org/jira/browse/DISPATCH-2173 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Reporter: michael goulish >Assignee: michael goulish >Priority: Major > > While testing scale-up of full-mesh networks I encountered some Bad Behavior > at 30 nodes. (435 connections.) > On my first try, 15 of the routers died. > On my second try, no nodes died – but the network never converged. It > consumed all available CPU (32 cores) for three minutes, and the 30 routers > printed a combined total of more than 1000 radius calculations to their logs > by the time I became wrathful and cast them all into the Bitbucket of Woe. > > For reference, those radius calculations are how I decide that the network > has converged – everybody has settled down and agreed on the topology and > stopped talking about it. The last thing each router prints to its log is a > radius calculation, and then it's done. This may happen multiple times for > each router, but when the total number of such prints stops changing – the > network has converged. > > For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this > test exceeded that by 25x, I decided it was never going to quit. > > ...Now looking at the logs to see if I can figure out what was happening... > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-2252) Document router shutdown process
[ https://issues.apache.org/jira/browse/DISPATCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419066#comment-17419066 ] michael goulish commented on DISPATCH-2252: --- ...And if I see along the way anything that clearly needs improvement or investigation, Jira that too. > Document router shutdown process > > > Key: DISPATCH-2252 > URL: https://issues.apache.org/jira/browse/DISPATCH-2252 > Project: Qpid Dispatch > Issue Type: Improvement >Reporter: michael goulish >Assignee: michael goulish >Priority: Minor > > Investigate the router shutdown process in detail, and produce a document in > the docs directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-2252) Document router shutdown process
michael goulish created DISPATCH-2252: - Summary: Document router shutdown process Key: DISPATCH-2252 URL: https://issues.apache.org/jira/browse/DISPATCH-2252 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish Assignee: michael goulish Investigate the router shutdown process in detail, and produce a document in the docs directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-2173) 30-Mesh Behaving Badly
michael goulish created DISPATCH-2173: - Summary: 30-Mesh Behaving Badly Key: DISPATCH-2173 URL: https://issues.apache.org/jira/browse/DISPATCH-2173 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Reporter: michael goulish Assignee: michael goulish While testing scale-up of full-mesh networks I encountered some Bad Behavior at 30 nodes. (435 connections.) On my first try, 15 of the routers died. On my second try, no nodes died – but the network never converged. It consumed all available CPU (32 cores) for three minutes, and the 30 routers printed a combined total of more than 1000 radius calculations to their logs by the time I became wrathful and cast them all into the Bitbucket of Woe. For reference, those radius calculations are how I decide that the network has converged – everybody has settled down and agreed on the topology and stopped talking about it. The last thing each router prints to its log is a radius calculation, and then it's done. This may happen multiple times for each router, but when the total number of such prints stops changing – the network has converged. For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this test exceeded that by 25x, I decided it was never going to quit. ...Now looking at the logs to see if I can figure out what was happening... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Assigned] (DISPATCH-2122) Data race on alloc pool descriptor initialization
[ https://issues.apache.org/jira/browse/DISPATCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-2122: - Assignee: michael goulish (was: Ken Giusti) > Data race on alloc pool descriptor initialization > - > > Key: DISPATCH-2122 > URL: https://issues.apache.org/jira/browse/DISPATCH-2122 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.16.0 >Reporter: Ken Giusti >Assignee: michael goulish >Priority: Major > Labels: race-condition, tsan > Fix For: 1.17.0 > > > 65: WARNING: ThreadSanitizer: data race (pid=566240) > 65: Read of size 4 at 0x7f67599ae2c0 by thread T4: > 65: #0 qd_alloc > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:324 > (libqpid-dispatch.so+0x6a1f2) > 65: #1 new_qd_link_ref_t > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 > (libqpid-dispatch.so+0x79ae5) > 65: #2 qdr_node_connect_deliveries > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 > (libqpid-dispatch.so+0x121a78) > 65: #3 CORE_link_deliver > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 > (libqpid-dispatch.so+0x127f1c) > 65: #4 qdr_link_process_deliveries > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 > (libqpid-dispatch.so+0x1045c6) > 65: #5 CORE_link_push > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 > (libqpid-dispatch.so+0x127d00) > 65: #6 qdr_connection_process > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 > (libqpid-dispatch.so+0xc4bec) > 65: #7 AMQP_writable_conn_handler > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 > (libqpid-dispatch.so+0x122d42) > 65: #8 writable_handler > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 > (libqpid-dispatch.so+0x7b2e2) > 65: #9 qd_container_handle_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 > (libqpid-dispatch.so+0x7cfd5) > 65: #10 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 > (libqpid-dispatch.so+0x130537) > 65: #11 thread_run > /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 > (libqpid-dispatch.so+0x13063a) > 65: #12 _thread_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 > (libqpid-dispatch.so+0xad37a) > 65: #13 (libtsan.so.0+0x2d33f) > 65: > 65: Previous write of size 4 at 0x7f67599ae2c0 by thread T2 (mutexes: write > M10): > 65: #0 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:307 > (libqpid-dispatch.so+0x6a14b) > 65: #1 qd_alloc > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:325 > (libqpid-dispatch.so+0x6a20b) > 65: #2 new_qd_link_ref_t > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 > (libqpid-dispatch.so+0x79ae5) > 65: #3 qdr_node_connect_deliveries > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 > (libqpid-dispatch.so+0x121a78) > 65: #4 CORE_link_deliver > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 > (libqpid-dispatch.so+0x127f1c) > 65: #5 qdr_link_process_deliveries > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 > (libqpid-dispatch.so+0x1045c6) > 65: #6 CORE_link_push > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 > (libqpid-dispatch.so+0x127d00) > 65: #7 qdr_connection_process > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 > (libqpid-dispatch.so+0xc4bec) > 65: #8 AMQP_writable_conn_handler > /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 > (libqpid-dispatch.so+0x122d42) > 65: #9 writable_handler > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 > (libqpid-dispatch.so+0x7b2e2) > 65: #10 qd_container_handle_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 > (libqpid-dispatch.so+0x7cfd5) > 65: #11 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 > (libqpid-dispatch.so+0x130537) > 65: #12 thread_run > /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 > (libqpid-dispatch.so+0x13063a) > 65: #13 _thread_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 > (libqpid-dispatch.so+0xad37a) > 65: #14 (libtsan.so.0+0x2d33f) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357425#comment-17357425 ] michael goulish commented on DISPATCH-1956: --- I meant to close my *PR*. Cripes. > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed DISPATCH-1956. - Resolution: Fixed Closing this one in favor of a better one coming shortly. > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Reopened] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reopened DISPATCH-1956: --- No, wait. I didn't mean it to say 'fixed'. Dang. > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357401#comment-17357401 ] michael goulish commented on DISPATCH-1956: --- Hold on – I think I have a much better solution to this. Need another hour or two... > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356529#comment-17356529 ] michael goulish commented on DISPATCH-1956: --- This might be an improvement in code logic, but it will introduce changes in behavior that are not relevant to this PR. Indeed – when I tried it, I got a test failure. Any code clean-up like this suggestion should be pursued as part of a separate PR just for that purpose. And then we can fix whatever issues it may introduce as part of that PR. > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350564#comment-17350564 ] michael goulish commented on DISPATCH-1956: --- Using Ken's reproducer, I cannot see exactly the same BT from latest master. But I see many reports of a similar cycle, so I will pick one of those and proceed. Here it is: 65: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 65: Cycle in lock order graph: M11 65: 65: Mutex M9 acquired here while holding mutex M11 in main thread: 65: #0 pthread_mutex_lock 65: #1 sys_mutex_lock src/posix/threading.c:57 65: #2 push_event src/entity_cache.c:61 65: #3 qd_entity_cache_add src/entity_cache.c:67 65: #4 qd_log_source_lh src/log.c:373 65: #5 qd_log_source_lh src/log.c:362 65: #6 qd_log_source src/log.c:381 65: #7 qd_log_initialize src/log.c:516 65: #8 qd_dispatch src/dispatch.c:90 65: #9 main_process router/src/main.c:92 65: #10 main router/src/main.c:369 65: 65: Mutex M11 previously acquired by the same thread here: 65: #0 pthread_mutex_lock 65: #1 sys_mutex_lock src/posix/threading.c:57 65: #2 qd_log_source src/log.c:380 65: #3 qd_log_initialize src/log.c:516 65: #4 qd_dispatch src/dispatch.c:90 65: #5 main_process router/src/main.c:92 65: #6 main router/src/main.c:369 65: 65: Mutex M11 acquired here while holding mutex M9 in main thread: 65: #0 pthread_mutex_lock 65: #1 sys_mutex_lock src/posix/threading.c:57 65: #2 qd_vlog_impl src/log.c:436 65: #3 qd_log_impl src/log.c:462 65: #4 qd_python_log src/python_embedded.c:545 65: #5 65: #6 main_process router/src/main.c:97 65: #7 main router/src/main.c:369 65: 65: Mutex M9 previously acquired by the same thread here: 65: #0 pthread_mutex_lock 65: #1 sys_mutex_lock src/posix/threading.c:57 65: #2 qd_entity_refresh_begin src/entity_cache.c:78 65: #3 ffi_call_unix64 65: #4 main_process router/src/main.c:97 65: #5 main router/src/main.c:369 65: > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process >
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347716#comment-17347716 ] michael goulish commented on DISPATCH-1956: --- Thanks, Ken, that works! I was commenting out this: #deadlock:qd_vlog_impl I can see it now. Tally ho! > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > Attachments: tsan.supp > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347377#comment-17347377 ] michael goulish commented on DISPATCH-1956: --- I will try the QE technique, and something I haven't tried before ... running multiple ctests at once! Yow! Except we're never going to establish the original frequency. It is unknowable. Imponderable. Ineffable. SO I will run the test enough times to support a proof-by-vigorous-handwaving! > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347365#comment-17347365 ] michael goulish commented on DISPATCH-1956: --- I am trying to restore my 'mgoulish' RH account, but I need help from someone with magical powers. I assumed that if I ran ctest, that would be sufficient. But now that you inform me that TSan issues do not reliably manifest, I will run ctest more times and see if I can get it to show itself. But if we don't know how it was observed, nor with what frequency how will we know when it is fixed? > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock
[ https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347330#comment-17347330 ] michael goulish commented on DISPATCH-1956: --- As of recent code on master, this is gone. If I unsuppress the following issues: {{ #race:qd_vlog_impl}} {{ #deadlock:qd_vlog_impl}} {{ #race:qd_log_entity}} ...and then run {{ctest -VV}} I get 2676 mentions of the qd_vlog_impl race (yikes!), 6 mentions of the qd_log_entity race, and 0 mentions of this qd_vlog_impl deadlock. I guess this should be closed, but I do not seem to have permission to close it. I will try to get my account fixed. > Potential deadlock: logging lock vs entity cache lock > - > > Key: DISPATCH-1956 > URL: https://issues.apache.org/jira/browse/DISPATCH-1956 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.15.0 >Reporter: Ken Giusti >Assignee: Michael Goulish >Priority: Major > Labels: deadlock, tsan > Fix For: 1.17.0 > > > {noformat} > WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) > (pid=1474955) > Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => > M11 > > Mutex M9 acquired here while holding mutex M11 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 push_event > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 > (libqpid-dispatch.so+0x6fa13) > #3 qd_entity_cache_add > /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 > (libqpid-dispatch.so+0x6fc26) > #4 qd_alloc_init > /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 > (libqpid-dispatch.so+0x5878b) > #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 > (libqpid-dispatch.so+0x5878b) > #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 > (libqpid-dispatch.so+0x75891) > #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 > (libqpid-dispatch.so+0x76205) > #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #9 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #10 (libpython3.8.so.1.0+0x12a23b) > #11 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative > warning message > > Mutex M11 acquired here while holding mutex M9 in main thread: > #0 pthread_mutex_lock (libtsan.so.0+0x528ac) > #1 sys_mutex_lock > /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 > (libqpid-dispatch.so+0x8cb7d) > #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 > (libqpid-dispatch.so+0x76200) > #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 > (libqpid-dispatch.so+0x76580) > #4 qd_python_log > /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 > (libqpid-dispatch.so+0x8d1cb) > #5 (libpython3.8.so.1.0+0x12a23b) > #6 main_process > /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 > (qdrouterd+0x40281c) > #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 > (qdrouterd+0x4024fc) > > SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) > (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed DISPATCH-2088. - Resolution: Fixed Fixed by Chuck's PR: https://github.com/apache/qpid-dispatch/pull/1174 > SEGV in qd_buffer_dec_fanout > > > Key: DISPATCH-2088 > URL: https://issues.apache.org/jira/browse/DISPATCH-2088 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Assignee: Charles E. Rolke >Priority: Blocker > Fix For: 1.16.0 > > > *code from 2021-04-26-afternoon* > { > dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 > proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 > } > > *Test* > * Doing 1-router TCP throughput testing across high-bandwidth link. > * Router has 32 worker threads. > * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams. > * Router is sustaining 10+ Gbit/sec during test. > * SEGV happens at end of test. > > Here's the backtrace: > > {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color} > {color:#de350b}#1 sys_atomic_dec (ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color} > {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color} > {color:#de350b}#3 qd_message_stream_data_release > (stream_data=0x7f01b80038c8){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color} > {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs > (conn=conn@entry=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color} > {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection > (tc=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color} > {color:#de350b}#6 0x7f023707491d in router_core_thread > (arg=0x1e6ccb0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color} > {color:#de350b}#7 0x7f0236f663f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334743#comment-17334743 ] michael goulish commented on DISPATCH-2088: --- Here you go! (gdb) thread apply all bt {color:#172b4d}Thread 33{color} (Thread 0x7fa320ff9640 (LWP 53393)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa2fb60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-dispatch/src/server.c:1105 #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 {color:#172b4d}Thread 32{color} (Thread 0x7fa2e8ff9640 (LWP 53408)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa2a8000b60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-dispatch/src/server.c:1105 #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 --Type for more, q to quit, c to continue without paging-- #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 {color:#172b4d}Thread 31{color} (Thread 0x7fa2e37fe640 (LWP 53409)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa2bc000b60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-dispatch/src/server.c:1105 #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 Thread 30 (Thread 0x7fa30effd640 (LWP 53396)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa30b60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-dispatch/src/server.c:1105 --Type for more, q to quit, c to continue without paging-- #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 Thread 29 (Thread 0x7fa30f7fe640 (LWP 53395)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa2fc000b60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-dispatch/src/server.c:1105 #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 Thread 28 (Thread 0x7fa30cff9640 (LWP 53400)): #0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x7fa343dbdcbb in suspend (ts=0x7fa2e4000b60, p=0xd46d30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393 #2 next_event_batch (p=0xd46d30, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455 #3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at /home/mick/latest/qpid-di--Type for more, q to quit, c to continue without paging--c spatch/src/server.c:1105 #4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0 #5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6 *{color:#de350b}Thread 27{color}* (Thread 0x7fa2eb7fe640 (LWP 53403)): #0 0x7fa343d8350c in send () from /lib64/libpthread.so.0 #1 0x7fa343dbe718 in snd (s=512, b=, fd=25) at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:333 #2 pni_raw_write (send=, set_error=, sock=, conn=) at /home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:566 #3 pni_raw_write (send=, set_error=, sock=25, conn=0x7fa2dc129cf0) at /home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:554 #4 pni_raw_connection_process (sched_ready=, t=0x7fa2dc129c30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:388 #5 process (tsk=0x7fa2dc129c30) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2230 #6 next_event_batch (p=, can_block=true) at
[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334707#comment-17334707 ] michael goulish commented on DISPATCH-2088: --- I cannot repro with Debug build. 400 iterations with no failure. > SEGV in qd_buffer_dec_fanout > > > Key: DISPATCH-2088 > URL: https://issues.apache.org/jira/browse/DISPATCH-2088 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > *code from 2021-04-26-afternoon* > { > dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 > proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 > } > > *Test* > * Doing 1-router TCP throughput testing across high-bandwidth link. > * Router has 32 worker threads. > * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams. > * Router is sustaining 10+ Gbit/sec during test. > * SEGV happens at end of test. > > Here's the backtrace: > > {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color} > {color:#de350b}#1 sys_atomic_dec (ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color} > {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color} > {color:#de350b}#3 qd_message_stream_data_release > (stream_data=0x7f01b80038c8){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color} > {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs > (conn=conn@entry=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color} > {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection > (tc=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color} > {color:#de350b}#6 0x7f023707491d in router_core_thread > (arg=0x1e6ccb0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color} > {color:#de350b}#7 0x7f0236f663f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334451#comment-17334451 ] michael goulish commented on DISPATCH-2088: --- I'm afraid only the last few lines have anything in them. 2021-04-28 01:04:03.818860 -0400 ROUTER_CORE (info) [C190][L379] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818868 -0400 ROUTER_CORE (info) [C191][L380] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818877 -0400 ROUTER_CORE (info) [C191][L381] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818882 -0400 ROUTER_CORE (info) [C192][L382] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818893 -0400 ROUTER_CORE (info) [C192][L383] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818905 -0400 ROUTER_CORE (info) [C193][L384] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818913 -0400 ROUTER_CORE (info) [C193][L385] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818926 -0400 ROUTER_CORE (info) [C194][L386] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818931 -0400 ROUTER_CORE (info) [C194][L387] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818944 -0400 ROUTER_CORE (info) [C195][L388] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818949 -0400 ROUTER_CORE (info) [C195][L389] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.818957 -0400 ROUTER_CORE (info) [C196][L390] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819046 -0400 ROUTER_CORE (info) [C196][L391] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819052 -0400 ROUTER_CORE (info) [C197][L392] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819059 -0400 ROUTER_CORE (info) [C197][L393] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819074 -0400 ROUTER_CORE (info) [C198][L394] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819081 -0400 ROUTER_CORE (info) [C198][L395] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819087 -0400 ROUTER_CORE (info) [C199][L396] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:03.819096 -0400 ROUTER_CORE (info) [C199][L397] Stuck delivery: At least one delivery on this link has been undelivered/unsettled for more than 10 seconds 2021-04-28 01:04:34.431844 -0400 TCP_ADAPTOR (info) [C181] PN_RAW_CONNECTION_DISCONNECTED connector 2021-04-28 01:04:34.431903 -0400 TCP_ADAPTOR (info) [C180] EOS 2021-04-28 01:04:34.431956 -0400 ROUTER_CORE (info) [C181][L361] Link lost: del=1 presett=1 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no 2021-04-28 01:04:34.432011 -0400 ROUTER_CORE (info) [C181][L360] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no 2021-04-28 01:04:34.432026 -0400 ROUTER_CORE (info) [C181] Connection Closed 2021-04-28 01:04:34.432479 -0400 TCP_ADAPTOR (info) [C183] PN_RAW_CONNECTION_DISCONNECTED connector ./r_one_router_Br: line 7: 27584 Segmentation fault (core dumped) qdrouterd --config ./Br_1.conf > SEGV in qd_buffer_dec_fanout > > > Key: DISPATCH-2088 > URL: https://issues.apache.org/jira/browse/DISPATCH-2088 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > *code from 2021-04-26-afternoon* > { > dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 > proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 > } > > *Test* > * Doing 1-router TCP throughput testing across high-bandwidth link. > * Router has 32
[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1780#comment-1780 ] michael goulish commented on DISPATCH-2088: --- Apparently it helps if you let the code cool down a while. I tried it again after a break and it crashed immediately – same backtrace. ( And with "-p 10" on the iperf client. ) So that is 2 crashes in 42 attempts. Here is my router config file: router { mode: interior id: Br workerThreads: 32 } tcpListener { host: 10.10.10.1 port: 9090 address: throughput siteId: my-site } tcpConnector { host: 10.10.10.1 port: 8080 address: throughput siteId: my-site } > SEGV in qd_buffer_dec_fanout > > > Key: DISPATCH-2088 > URL: https://issues.apache.org/jira/browse/DISPATCH-2088 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > *code from 2021-04-26-afternoon* > { > dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 > proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 > } > > *Test* > * Doing 1-router TCP throughput testing across high-bandwidth link. > * Router has 32 worker threads. > * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams. > * Router is sustaining 10+ Gbit/sec during test. > * SEGV happens at end of test. > > Here's the backtrace: > > {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color} > {color:#de350b}#1 sys_atomic_dec (ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color} > {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color} > {color:#de350b}#3 qd_message_stream_data_release > (stream_data=0x7f01b80038c8){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color} > {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs > (conn=conn@entry=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color} > {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection > (tc=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color} > {color:#de350b}#6 0x7f023707491d in router_core_thread > (arg=0x1e6ccb0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color} > {color:#de350b}#7 0x7f0236f663f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
[ https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333284#comment-17333284 ] michael goulish commented on DISPATCH-2088: --- *The iperf commands I used in the test:* iperf3 -s -p 8080 # server iperf3 -c 10.10.10.1 -p 9090 -t 60 -P 10 # client ( The router's TCP listener was on port 9090, while its TCP connector was on 8080. ) *Reproducability:* Not trivial. I reduced test time to 10 second and tried 40 more times – without success. 10 of those trials were with 100 parallel threads on the iperf sender, and 10 of them were with 200 parallel threads. > SEGV in qd_buffer_dec_fanout > > > Key: DISPATCH-2088 > URL: https://issues.apache.org/jira/browse/DISPATCH-2088 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > *code from 2021-04-26-afternoon* > { > dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 > proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 > } > > *Test* > * Doing 1-router TCP throughput testing across high-bandwidth link. > * Router has 32 worker threads. > * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams. > * Router is sustaining 10+ Gbit/sec during test. > * SEGV happens at end of test. > > Here's the backtrace: > > {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color} > {color:#de350b}#1 sys_atomic_dec (ref=0x14){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color} > {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color} > {color:#de350b}#3 qd_message_stream_data_release > (stream_data=0x7f01b80038c8){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color} > {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs > (conn=conn@entry=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color} > {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection > (tc=0x7f0218012a88){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color} > {color:#de350b}#6 0x7f023707491d in router_core_thread > (arg=0x1e6ccb0){color} > {color:#de350b} at > /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color} > {color:#de350b}#7 0x7f0236f663f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout
michael goulish created DISPATCH-2088: - Summary: SEGV in qd_buffer_dec_fanout Key: DISPATCH-2088 URL: https://issues.apache.org/jira/browse/DISPATCH-2088 Project: Qpid Dispatch Issue Type: Bug Components: Protocol Adaptors Reporter: michael goulish *code from 2021-04-26-afternoon* { dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04 proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936 } *Test* * Doing 1-router TCP throughput testing across high-bandwidth link. * Router has 32 worker threads. * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams. * Router is sustaining 10+ Gbit/sec during test. * SEGV happens at end of test. Here's the backtrace: {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color} {color:#de350b}#1 sys_atomic_dec (ref=0x14){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color} {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color} {color:#de350b}#3 qd_message_stream_data_release (stream_data=0x7f01b80038c8){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color} {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs (conn=conn@entry=0x7f0218012a88){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color} {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection (tc=0x7f0218012a88){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color} {color:#de350b}#6 0x7f023707491d in router_core_thread (arg=0x1e6ccb0){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color} {color:#de350b}#7 0x7f0236f663f9 in start_thread () from /lib64/libpthread.so.0{color} {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307 ] michael goulish edited comment on PROTON-2362 at 4/1/21, 4:49 PM: -- OK, here's the whole list. 64 threads, 30 seconds per run, 50 runs for each feature. With all actions enabled crash 10 {{ {{-no-close-connect crash 12 {{-no-listen }}{{crash 0 hang 2}} {{ {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}}} {{ {color:#de350b}{{-no-connect NO PROBLEMS}}{color}}} {{ {{-no-close-connect crash 10 hang 2 {{ {{-no-wake crash 11 {{ {{-no-timeout crash 11 {{no-cancel-timeout crash 12}} was (Author: michaelgoulish): OK, here's the whole list. 64 threads, 30 seconds per run, 50 runs for each feature. {{With all actions enabled crash 10}} {{-no-close-connect crash 12}} -no-listen crash 0 hang 2 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color} {color:#de350b}{{-no-connect NO PROBLEMS}}{color} {{-no-close-connect crash 10 hang 2}} {{-no-wake crash 11}} {{-no-timeout crash 11}} {{no-cancel-timeout crash 12}} > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Priority: Major > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307 ] michael goulish edited comment on PROTON-2362 at 4/1/21, 4:48 PM: -- OK, here's the whole list. 64 threads, 30 seconds per run, 50 runs for each feature. {{With all actions enabled crash 10}} {{-no-close-connect crash 12}} -no-listen crash 0 hang 2 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color} {color:#de350b}{{-no-connect NO PROBLEMS}}{color} {{-no-close-connect crash 10 hang 2}} {{-no-wake crash 11}} {{-no-timeout crash 11}} {{no-cancel-timeout crash 12}} was (Author: michaelgoulish): OK, here's the whole list. 64 threads, 30 seconds per run, 50 runs for each feature. {{With all actions enabled crash 10}} {{-no-close-connect crash 12}} {{ -no-listen crash 0 hang 2}} {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color} {color:#de350b}{{-no-connect NO PROBLEMS}}{color} {{-no-close-connect crash 10 hang 2}} {{-no-wake crash 11}} {{-no-timeout crash 11}} {{no-cancel-timeout crash 12}} > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Priority: Major > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307 ] michael goulish commented on PROTON-2362: - OK, here's the whole list. 64 threads, 30 seconds per run, 50 runs for each feature. {{With all actions enabled crash 10}} {{-no-close-connect crash 12}} {{ -no-listen crash 0 hang 2}} {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color} {color:#de350b}{{-no-connect NO PROBLEMS}}{color} {{-no-close-connect crash 10 hang 2}} {{-no-wake crash 11}} {{-no-timeout crash 11}} {{no-cancel-timeout crash 12}} > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Priority: Major > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.
[ https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313151#comment-17313151 ] michael goulish commented on PROTON-2362: - I am running batches of 50 threaderciser tests, 64 threads each, turning off one feature at a time, and counting failures. See if you can spot the case, below, that I feel may be interesting. All Features On crash: 10 -no-close-connect crash: 12 -no-listen crash: 0 hang: 2 -no-close-listen (y) :) (*)(*r)(*g) {color:#de350b}*NO PROBLEMS (*g)(*r)(*) :) (y)* {color} {color:#de350b}{color:#172b4d}~{color:#c1c7d0} (sorry, I can't figure out how to make the above text blink){color}~{color}{color} p.s. _"Brontosaurus"_ means _"Thunder Lizard"_, a kind of dinosaur. I do not have a dinosaur. _"Brontonomicon",_ on the other hand, means _"What the Thunder Said"_ or _"Words of the Thunder"_ or possibly _"The Book of Thunder"_. That's what I've got. And when the thunder speaks, the software had better listen. > c-threaderciser timed out on 32-core machine. > - > > Key: PROTON-2362 > URL: https://issues.apache.org/jira/browse/PROTON-2362 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Priority: Major > > Using recent master – maybe 3 days old or so – I just ran Proton's ctest, > after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 > threads. > > Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. > ( 1.5e18 femtoseconds. ) > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2362) c-threaderciser timed out on 32-core machine.
michael goulish created PROTON-2362: --- Summary: c-threaderciser timed out on 32-core machine. Key: PROTON-2362 URL: https://issues.apache.org/jira/browse/PROTON-2362 Project: Qpid Proton Issue Type: Bug Reporter: michael goulish Using recent master – maybe 3 days old or so – I just ran Proton's ctest, after turning on THREADERCISER. I ran it on a box with 32 physical cores, 64 threads. Test number 6 – c-threaderciser – failed with timeout after 1500 seconds. ( 1.5e18 femtoseconds. ) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load
[ https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312627#comment-17312627 ] michael goulish commented on DISPATCH-2014: --- I just ran Proton's ctest suite with the THREADERCISER turned on – on my box with 32 physical cores, 64 'threads'. Test number 6 – "c-threaderciser" – timed out after 1500 seconds. > Router TCP Adapter crash with high thread count and load > > > Key: DISPATCH-2014 > URL: https://issues.apache.org/jira/browse/DISPATCH-2014 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > Using latest proton and dispatch master code as of 3 hours ago. > Testing router TCP adapter on a machine with 32 cores / 64 threads. > I gave the router 64 worker threads, then used 'hey' load generator to send > it HTTP requests to a TCP listener which router forwarded to Nginx on same > machine. > Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each > sender throttled to 10 messages per second. > It survived many tests, but crashed around test with 200 senders. > I believe this is easily repeatable – I will go check that now. > > Here is the thread that crashed: > {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b} #1 0x7f33186e2848 in lock (m=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b} #2 process (tsk=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color} > {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b} #5 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > > {color:#172b4d}And here are all the threads:{color} > {color:#de350b}(gdb) thread apply all bt{color} > {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} > {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} > {color:#de350b}#5 pn_proactor_done (p=0x10ed970, > batch=batch@entry=0x7f326811a578) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} > {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1140{color} > {color:#de350b}#7 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 process (tsk=) at > /home/mick/latest/qpid-proton/c/src/proacto--Type for more, q to quit, > c to continue without paging--{color} > {color:#de350b}r/epoll.c:2248{color} > {color:#de350b}#4 next_event_batch (p=, can_block=true) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b}#6 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at >
[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load
[ https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306496#comment-17306496 ] michael goulish commented on DISPATCH-2014: --- When I used 64 dispatch worker threads and hit it with 200 'hey' senders – each test 30 seconds long – it died 3 out of 4 times. (SEGV) When I went down to 32 dispatch worker threads, it survived 3 out of 3 tests with 200 senders, and then 3 out of 3 tests with 500 senders, and then 3 out of 3 tests with 1000 senders. > Router TCP Adapter crash with high thread count and load > > > Key: DISPATCH-2014 > URL: https://issues.apache.org/jira/browse/DISPATCH-2014 > Project: Qpid Dispatch > Issue Type: Bug > Components: Protocol Adaptors >Reporter: michael goulish >Priority: Major > > Using latest proton and dispatch master code as of 3 hours ago. > Testing router TCP adapter on a machine with 32 cores / 64 threads. > I gave the router 64 worker threads, then used 'hey' load generator to send > it HTTP requests to a TCP listener which router forwarded to Nginx on same > machine. > Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each > sender throttled to 10 messages per second. > It survived many tests, but crashed around test with 200 senders. > I believe this is easily repeatable – I will go check that now. > > Here is the thread that crashed: > {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b} #1 0x7f33186e2848 in lock (m=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b} #2 process (tsk=){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color} > {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color} > {color:#de350b} at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color} > {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b} #5 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > > {color:#172b4d}And here are all the threads:{color} > {color:#de350b}(gdb) thread apply all bt{color} > {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} > {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} > {color:#de350b}#5 pn_proactor_done (p=0x10ed970, > batch=batch@entry=0x7f326811a578) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} > {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1140{color} > {color:#de350b}#7 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from > /lib64/libpthread.so.0{color} > {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} > {color:#de350b}#3 process (tsk=) at > /home/mick/latest/qpid-proton/c/src/proacto--Type for more, q to quit, > c to continue without paging--{color} > {color:#de350b}r/epoll.c:2248{color} > {color:#de350b}#4 next_event_batch (p=, can_block=true) at > /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} > {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at > /home/mick/latest/qpid-dispatch/src/server.c:1107{color} > {color:#de350b}#6 0x7f331869e3f9 in start_thread () from > /lib64/libpthread.so.0{color} > {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} > {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color} > {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from > /lib64/libpthread.so.0{color} > {color:#de350b}#1
[jira] [Created] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load
michael goulish created DISPATCH-2014: - Summary: Router TCP Adapter crash with high thread count and load Key: DISPATCH-2014 URL: https://issues.apache.org/jira/browse/DISPATCH-2014 Project: Qpid Dispatch Issue Type: Bug Components: Protocol Adaptors Reporter: michael goulish Using latest proton and dispatch master code as of 3 hours ago. Testing router TCP adapter on a machine with 32 cores / 64 threads. I gave the router 64 worker threads, then used 'hey' load generator to send it HTTP requests to a TCP listener which router forwarded to Nginx on same machine. Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each sender throttled to 10 messages per second. It survived many tests, but crashed around test with 200 senders. I believe this is easily repeatable – I will go check that now. Here is the thread that crashed: {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from /lib64/libpthread.so.0{color} {color:#de350b} #1 0x7f33186e2848 in lock (m=){color} {color:#de350b} at /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} {color:#de350b} #2 process (tsk=){color} {color:#de350b} at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color} {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color} {color:#de350b} at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color} {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} {color:#de350b} #5 0x7f331869e3f9 in start_thread () from /lib64/libpthread.so.0{color} {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} {color:#172b4d}And here are all the threads:{color} {color:#de350b}(gdb) thread apply all bt{color} {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color} {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from /lib64/libpthread.so.0{color} {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from /lib64/libpthread.so.0{color} {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} {color:#de350b}#5 pn_proactor_done (p=0x10ed970, batch=batch@entry=0x7f326811a578) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at /home/mick/latest/qpid-dispatch/src/server.c:1140{color} {color:#de350b}#7 0x7f331869e3f9 in start_thread () from /lib64/libpthread.so.0{color} {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color} {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from /lib64/libpthread.so.0{color} {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from /lib64/libpthread.so.0{color} {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} {color:#de350b}#3 process (tsk=) at /home/mick/latest/qpid-proton/c/src/proacto--Type for more, q to quit, c to continue without paging--{color} {color:#de350b}r/epoll.c:2248{color} {color:#de350b}#4 next_event_batch (p=, can_block=true) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color} {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at /home/mick/latest/qpid-dispatch/src/server.c:1107{color} {color:#de350b}#6 0x7f331869e3f9 in start_thread () from /lib64/libpthread.so.0{color} {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color} {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color} {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from /lib64/libpthread.so.0{color} {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from /lib64/libpthread.so.0{color} {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color} {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color} {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color} {color:#de350b}#5 pn_proactor_done (p=0x10ed970, batch=batch@entry=0x7f32c8063af8) at /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color} {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at
[jira] [Assigned] (DISPATCH-1368) Link (address) priority is ignored by the second hop router
[ https://issues.apache.org/jira/browse/DISPATCH-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-1368: - Assignee: michael goulish > Link (address) priority is ignored by the second hop router > --- > > Key: DISPATCH-1368 > URL: https://issues.apache.org/jira/browse/DISPATCH-1368 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.8.0 >Reporter: Ken Giusti >Assignee: michael goulish >Priority: Major > Fix For: 1.9.0 > > > Address-based priority is only enforced on the egress of the first hop router. > In a 3 router linear network: > Sender --> Router A --> Router B --> Router C --> Receiver > Message delivery is properly sent via the inter-router links between Router A > and Router B. > However, those messages are all forwarded on the default priority (4) between > router B and C. > [C --> Receiver is fine - priority doesn't apply to egress endpoint links] > The expectation is that the message priority is honored across all > inter-router links. > [Reproducer|https://github.com/kgiusti/dispatch/tree/DISPATCH-1368-reproducer] > Build the router, then run the priority test (ctest -VV -R priority). > Then grep for "DELIVERIES" in the log files: > grep "DELIVERIES" > tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/*.log > tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/A.log:2019-06-14 > 11:10:00.324389 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=20 8=0 7=28 > 6=0 5=0 4(default)=21 3=0 2=12 1=0 0=343 > (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188) > tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/B.log:2019-06-14 > 11:10:00.302570 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=0 8=0 7=0 > 6=0 5=0 4(default)=172 3=0 2=0 1=0 0=286 > (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188) > ... > Notice the counts on A (tx to B) - these are correct. > On B all msgs are sent priority 4 (default) to C - this is wrong. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-2046) pn_connection_set_container should check for null or empty string
michael goulish created PROTON-2046: --- Summary: pn_connection_set_container should check for null or empty string Key: PROTON-2046 URL: https://issues.apache.org/jira/browse/PROTON-2046 Project: Qpid Proton Issue Type: Bug Components: proton-c Reporter: michael goulish pn_connection_set_container() makes no checks of the ID string that gets passed in. This value is expected to be unique, so it should probably check for NULL and empty-string. I was passing in empty strings and it was cheerfully accepting them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release
[ https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809913#comment-16809913 ] michael goulish commented on DISPATCH-1309: --- Chuck – Are you sure you mean "5672" ? More normal for the console would be "5673". I could not get mine to crash, with 50 repetitions of \{ connect + disconnect } , with 5673 – with one router or my whole Death Star network. When I tried it 5672, I could not get it to connect at all. > Various crashes in 1.6 release > -- > > Key: DISPATCH-1309 > URL: https://issues.apache.org/jira/browse/DISPATCH-1309 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 1.6.0 > Environment: System 'unused':( > Fedora 5.0.3-200.fc29.x86_64, > Python 2.7.15, > Proton master @ eab1f. > System 'taj':( > Fedora 4.18.16-200.fc28.x86_64, > Python 3.6.6, > Proton master @ 68b38 >Reporter: Chuck Rolke >Priority: Major > Attachments: DISPATCH-1309-backtraces.txt, > DISPATCH-1309-gen_configs_linear.py > > > qpid-dispatch master @ 51244, which is very close to the 1.6 release, has > various crashes. > The test network is 12 routers spread over two systems. (Configuration > generator to be attached.) Four interior routers are in linear arrangement > with A and C on one system ('unused'), and B and D on the other system > ('taj'). Each system then attaches four edge routers, one to each interior > router. > Running lightweight tests, like proton cpp simple_send and simple_recv to > ports on INTA and INTB interior routers leads to a crash on INTC. The crashes > typically look like reuse of structures after they have been freed (addresses > are 0x). Other crashes hint of general memory corruption > (crashes in malloc.c). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release
[ https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809806#comment-16809806 ] michael goulish commented on DISPATCH-1309: --- Yee hah! Chuck's comment reminded me – I believe I have also seen crashes *only* when the console was attached. Furthermore, I think I have seen crashes maybe not *only* but *more often* when I was *shutting down* a console *while* the network was still running. I tried that just now – with 1.6 code. I had to start, stop, and restart the console 11 times, but then it happened. Boom. With this core: #0 pn_collector_put (collector=0x4242424242424242, clazz=0x7f0e99c38520 , context=0x0, type=type@entry=PN_CONNECTION_WAKE) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/event.c:134 #1 0x7f0e99ca6258 in http_thread_run (v=0x2036850) at /home/mick/latest/qpid-dispatch-1.6.0/src/http-libwebsockets.c:731 #2 0x7f0e995df50b in start_thread () from /lib64/libpthread.so.0 #3 0x7f0e988a338f in clone () from /lib64/libc.so.6 Which is one I have seen before. Now I have *some hope* of getting some kind of baseline, based on number of crashes per console stop-and-restart, so that I can do some kind of vivisection of the code. > Various crashes in 1.6 release > -- > > Key: DISPATCH-1309 > URL: https://issues.apache.org/jira/browse/DISPATCH-1309 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 1.6.0 > Environment: System 'unused':( > Fedora 5.0.3-200.fc29.x86_64, > Python 2.7.15, > Proton master @ eab1f. > System 'taj':( > Fedora 4.18.16-200.fc28.x86_64, > Python 3.6.6, > Proton master @ 68b38 >Reporter: Chuck Rolke >Priority: Major > Attachments: DISPATCH-1309-backtraces.txt, > DISPATCH-1309-gen_configs_linear.py > > > qpid-dispatch master @ 51244, which is very close to the 1.6 release, has > various crashes. > The test network is 12 routers spread over two systems. (Configuration > generator to be attached.) Four interior routers are in linear arrangement > with A and C on one system ('unused'), and B and D on the other system > ('taj'). Each system then attaches four edge routers, one to each interior > router. > Running lightweight tests, like proton cpp simple_send and simple_recv to > ports on INTA and INTB interior routers leads to a crash on INTC. The crashes > typically look like reuse of structures after they have been freed (addresses > are 0x). Other crashes hint of general memory corruption > (crashes in malloc.c). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release
[ https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808055#comment-16808055 ] michael goulish commented on DISPATCH-1309: --- And since the above comment I have not been able to get another crash :( > Various crashes in 1.6 release > -- > > Key: DISPATCH-1309 > URL: https://issues.apache.org/jira/browse/DISPATCH-1309 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 1.6.0 > Environment: System 'unused':( > Fedora 5.0.3-200.fc29.x86_64, > Python 2.7.15, > Proton master @ eab1f. > System 'taj':( > Fedora 4.18.16-200.fc28.x86_64, > Python 3.6.6, > Proton master @ 68b38 >Reporter: Chuck Rolke >Priority: Major > Attachments: DISPATCH-1309-backtraces.txt, > DISPATCH-1309-gen_configs_linear.py > > > qpid-dispatch master @ 51244, which is very close to the 1.6 release, has > various crashes. > The test network is 12 routers spread over two systems. (Configuration > generator to be attached.) Four interior routers are in linear arrangement > with A and C on one system ('unused'), and B and D on the other system > ('taj'). Each system then attaches four edge routers, one to each interior > router. > Running lightweight tests, like proton cpp simple_send and simple_recv to > ports on INTA and INTB interior routers leads to a crash on INTC. The crashes > typically look like reuse of structures after they have been freed (addresses > are 0x). Other crashes hint of general memory corruption > (crashes in malloc.c). > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release
[ https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808007#comment-16808007 ] michael goulish commented on DISPATCH-1309: --- OK! I thought Mercury might help reproduce this more easily, and ... it did. I made a 13-router star-shaped network ( the Death Star ) – 12 routers in a circle and one at the center. There was 1 receiver at every router on the circle all hoping for 1 million messages. 1 sender at the center router, trying to make all the receivers happy. It ran for a good amount of time – I could see the traffic turning all the links green using the console – and then 7 routers crashed all at once, generating 5 different types of core files. Which follow. ## # Type 1 ## #0 0x7f230750 in raise () from /lib64/libc.so.6 #1 0x7f231d31 in abort () from /lib64/libc.so.6 #2 0x7f23bbba905a in __assert_fail_base () from /lib64/libc.so.6 #3 0x7f23bbba90d2 in __assert_fail () from /lib64/libc.so.6 #4 0x7f23bc9b8e6f in __pthread_tpp_change_priority () from /lib64/libpthread.so.0 #5 0x7f23bc9af8fb in __pthread_mutex_lock_full () from /lib64/libpthread.so.0 #6 0x7f23bd044309 in qdra_config_address_create_CT (core=0x7f23a805e0d8, name=, query=0x7f23a00307d8, in_body=) at /home/mick/latest/qpid-dispatch-1.6.0/src/router_core/agent_config_address.c:446 #7 0x in ?? () in qdra_config_address_create_CT (gdb) list 441 addr->priority = priority; 442 pattern = 0; 443 444 qd_iterator_reset_view(iter, ITER_VIEW_ALL); 445 qd_parse_tree_add_pattern(core->addr_parse_tree, iter, addr); 446 DEQ_INSERT_TAIL(core->addr_config, addr); 447 448 // 449 // Compose the result map for the response. 450 // ## # Type 2 ## #0 connection_wake (conn=) at /home/mick/latest/qpid-dispatch-1.6.0/src/remote_sasl.c:241 #1 0x7f7cef4884cb in pni_sasl_impl_free (transport=0x7f7cd4015180) at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:181 #2 pn_sasl_free (transport=0x7f7cd4015180) at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:764 #3 0x7f7cef480b90 in pn_transport_finalize (object=0x7f7cd4015180) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:665 #4 0x7f7cef472a99 in pn_class_decref (clazz=0x7f7cef69aca0 , clazz@entry=0x7f7cef69a520 , object=0x7f7cd4015180) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:95 #5 0x7f7cef472cbf in pn_decref (object=) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:253 #6 0x7f7cef480851 in pn_transport_free (transport=) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:644 #7 0x7f7cef47b994 in pn_connection_driver_destroy (d=d@entry=0x7f7cd4014d98) at /home/mick/latest/qpid-proton-0.26.0/c/src/core/connection_driver.c:94 #8 0x7f7cef25b604 in pconnection_final_free (pc=0x7f7cd40147f0) at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:889 #9 0x7f7cef25c4fc in pconnection_cleanup (pc=) at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:905 #10 0x7f7cef25d295 in pconnection_process (pc=0x7f7cd40147f0, events=, timeout=timeout@entry=false, topup=false, is_io_2=) at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:1273 #11 0x7f7cef25dd03 in proactor_do_epoll (p=0x1ee9600, can_block=can_block@entry=true) at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2139 #12 0x7f7cef25ef2a in pn_proactor_wait (p=) at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2157 #13 0x7f7cef7057af in thread_run (arg=0x1db7960) at /home/mick/latest/qpid-dispatch-1.6.0/src/server.c:994 #14 0x7f7cef04150b in start_thread () from /lib64/libpthread.so.0 #15 0x7f7cee30538f in clone () from /lib64/libc.so.6 ## # Type 3 ## #0 qd_hash_internal_retrieve_with_hash (hash=, key=key@entry=0x7f140c097ad8, h=, h=) at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:204 #1 0x7f1432401a15 in qd_hash_internal_retrieve (key=0x7f140c097ad8, h=0x7f141c000bc0) at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:219 #2 qd_hash_retrieve (h=0x7f141c000bc0, key=key@entry=0x7f140c097ad8, val=val@entry=0x7ffe6c6ac638) at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:270 #3 0x7f14324312e6 in qdr_lookup_terminus_address_CT (core=0xb656c0, dir=, conn=conn@entry=0x7f140c076798, terminus=0x7f140c086258, link_route=link_route@entry=0x7ffe6c6ac77d, unavailable=unavailable@entry=0x7ffe6c6ac77e, core_endpoint=0x7ffe6c6ac77f, accept_dynamic=true,
[jira] [Closed] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed DISPATCH-1280. - Resolution: Fixed > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798961#comment-16798961 ] michael goulish commented on DISPATCH-1280: --- LWS developer pushed patch. I got through 100 iterations of my reproducer on master with no crash. (I could not do enough iterations before to get a real baseline, but I did get one crash in first 20 tries.) I think it's a deadbug. > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302 ] michael goulish edited comment on DISPATCH-1280 at 3/21/19 5:53 PM: Well, kinda. I saw one crash using LWS latest master, and then I tried 20 more times and all I got was this error message: NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello (default vhost default) ( On LWS version 3.0.1 the crash hapens every time. ) But! The one crash I did see had basically identical backtrace as in version 3.0.1. (See previous comment.) I raised an issue with LWS: [https://github.com/warmcat/libwebsockets/issues/1527] was (Author: mgoulish): Well, kinda. I saw one crash using LWS latest master, and then I tried 20 more times and all I got was this error message: NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello (default vhost default) But! The one crash I did see had basically identical backtrace as in version 3.0.1. (See previous comment.) I raised an issue with LWS: https://github.com/warmcat/libwebsockets/issues/1527 > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302 ] michael goulish commented on DISPATCH-1280: --- Well, kinda. I saw one crash using LWS latest master, and then I tried 20 more times and all I got was this error message: NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello (default vhost default) But! The one crash I did see had basically identical backtrace as in version 3.0.1. (See previous comment.) I raised an issue with LWS: https://github.com/warmcat/libwebsockets/issues/1527 > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797619#comment-16797619 ] michael goulish commented on DISPATCH-1280: --- reproduced with simple example. What I did: 1. from the lws code tree for 3.0.1 (7 Sep 2018, fb31602ff9aeb88267fb8132d48df31195782ae5) use the example minimal-examples/http-server/minimal-http-server-tls. 2. Alter the .c file this way: info.options = LWS_SERVER_OPTION_DO_SSL_GLOBAL_INIT | {color:#FF}LWS_SERVER_OPTION_ALLOW_NON_SSL_ON_SSL_PORT{color} ; 3. build and run it. It listens on [https://localhost:7681|https://localhost:7681/] 4. In browser, do this request: [http://localhost:7681/index.html] big bada boom. #0 0x7f63281fff60 in SSL_get0_alpn_selected () from /lib64/libssl.so.1.1 #1 0x7f632880ea17 in lws_tls_server_conn_alpn () from /usr/local/lib/libwebsockets.so.13 #2 0x7f632880ee98 in lws_server_socket_service_ssl () from /usr/local/lib/libwebsockets.so.13 #3 0x7f632880d1ad in rops_handle_POLLIN_listen () from /usr/local/lib/libwebsockets.so.13 #4 0x7f6328800389 in lws_service_fd_tsi () from /usr/local/lib/libwebsockets.so.13 #5 0x7f6328816ce7 in _lws_plat_service_tsi.part.1 () from /usr/local/lib/libwebsockets.so.13 #6 0x7f6328800455 in lws_service () from /usr/local/lib/libwebsockets.so.13 #7 0x00400965 in main (argc=1, argv=0x7fff71638b68) at minimal-http-server-tls.c:87 Next I will see if this still happens with latest code. > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797575#comment-16797575 ] michael goulish commented on DISPATCH-1280: --- Looked at closed issues back to release date of v2.4.2 (8 March 2018). Nothing looks like the issue we are seeing. Closed issues are here: https://github.com/warmcat/libwebsockets/issues?page=11=is%3Aissue+is%3Aclosed > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797391#comment-16797391 ] michael goulish commented on DISPATCH-1280: --- It sounds like this happens all the time. Is that true? Not a rare occurrence? > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Assigned] (DISPATCH-1280) http against https enabled listener causes segfault
[ https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-1280: - Assignee: michael goulish > http against https enabled listener causes segfault > --- > > Key: DISPATCH-1280 > URL: https://issues.apache.org/jira/browse/DISPATCH-1280 > Project: Qpid Dispatch > Issue Type: Bug >Reporter: Gordon Sim >Assignee: michael goulish >Priority: Major > > If you have a listener with http enabled, an ssl profile referenced, but > requireSsl set to false, and then try to access it over plain http, you get a > segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of > libwebsockets fixes this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1215) several memory leaks in edge-router soak test
michael goulish created DISPATCH-1215: - Summary: several memory leaks in edge-router soak test Key: DISPATCH-1215 URL: https://issues.apache.org/jira/browse/DISPATCH-1215 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish Using recent master code trees (dispatch and proton)... The test sets up a simple 3-linear router network, A-B-C, and attaches 100 edge routers to A. It then kills one edge router, replaces it, and repeats that kill-and-replace operation 50 times. (At which point I manually killed router A.) Router A was running under valgrind, and produced the following output: {color:#ff} {color} {color:#ff}[mick@colossus ~]$ /usr/bin/valgrind --leak-check=full --show-leak-kinds=definite --trace-children=yes --suppressions=/home/mick/latest/qpid-dispatch/tests/valgrind.supp /home/mick/latest/install/dispatch/sbin/qdrouterd --config /home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I /home/mick/latest/install/dispatch/lib/qpid-dispatch/python ==9409== Memcheck, a memory error detector ==9409== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==9409== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==9409== Command: /home/mick/latest/install/dispatch/sbin/qdrouterd --config /home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I /home/mick/latest/install/dispatch/lib/qpid-dispatch/python ==9409== ^C==9409== ==9409== Process terminating with default action of signal 2 (SIGINT) ==9409== at 0x61C0A37: kill (in /usr/lib64/[libc-2.26.so|http://libc-2.26.so/]) ==9409== by 0x401636: main (main.c:367) ==9409== ==9409== HEAP SUMMARY: ==9409== in use at exit: 6,933,690 bytes in 41,903 blocks ==9409== total heap usage: 669,024 allocs, 627,121 frees, 92,449,020 bytes allocated ==9409== ==9409== *8,640 (480 direct, 8,160 indirect) bytes in 20 blocks are definitely lost in loss record 4,229 of 4,323* ==9409== at 0x4C2CB6B: malloc (vg_replace_malloc.c:299) ==9409== by 0x4E7D336: qdr_error_from_pn (error.c:37) ==9409== by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822) ==9409== by 0x4E60A6C: close_links (container.c:298) ==9409== by 0x4E6109F: close_handler (container.c:311) ==9409== by 0x4E6109F: qd_container_handle_event (container.c:639) ==9409== by 0x4E93971: handle (server.c:985) ==9409== by 0x4E944C8: thread_run (server.c:1010) ==9409== by 0x4E947CF: qd_server_run (server.c:1284) ==9409== by 0x40186E: main_process (main.c:112) ==9409== by 0x401636: main (main.c:367) ==9409== ==9409== *14,256 (792 direct, 13,464 indirect) bytes in 33 blocks are definitely lost in loss record 4,261 of 4,323* ==9409== at 0x4C2CB6B: malloc (vg_replace_malloc.c:299) ==9409== by 0x4E7D336: qdr_error_from_pn (error.c:37) ==9409== by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822) ==9409== by 0x4E60A6C: close_links (container.c:298) ==9409== by 0x4E6109F: close_handler (container.c:311) ==9409== by 0x4E6109F: qd_container_handle_event (container.c:639) ==9409== by 0x4E93971: handle (server.c:985) ==9409== by 0x4E944C8: thread_run (server.c:1010) {color} {color:#ff}==9409== by 0x550150A: start_thread (in /usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/]){color} {color:#ff}==9409== by 0x628138E: clone (in /usr/lib64/[libc-2.26.so|http://libc-2.26.so/]) ==9409== ==9409== *575,713 (24 direct, 575,689 indirect) bytes in 1 blocks are definitely lost in loss record 4,321 of 4,323* ==9409== at 0x4C2CB6B: malloc (vg_replace_malloc.c:299) ==9409== by 0x4E83FCA: qdr_add_link_ref (router_core.c:518) ==9409== by 0x4E7A3BF: qdr_link_inbound_first_attach_CT (connections.c:1517) ==9409== by 0x4E8484B: router_core_thread (router_core_thread.c:116) ==9409== by 0x550150A: start_thread (in /usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/]) ==9409== by 0x628138E: clone (in /usr/lib64/[libc-2.26.so|http://libc-2.26.so/]) ==9409== ==9409== LEAK SUMMARY: ==9409== definitely lost: 1,296 bytes in 54 blocks ==9409== indirectly lost: 597,313 bytes in 3,096 blocks ==9409== possibly lost: 1,473,248 bytes in 6,538 blocks ==9409== still reachable: 4,861,833 bytes in 32,215 blocks ==9409== suppressed: 0 bytes in 0 blocks ==9409== Reachable blocks (those to which a pointer was found) are not shown. ==9409== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==9409== ==9409== For counts of detected and suppressed errors, rerun with: -v ==9409== ERROR SUMMARY: 1040 errors from 1040 contexts (suppressed: 0 from 0) {color} [mick@colossus ~]$ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1155) dueling httpRootDirs
michael goulish created DISPATCH-1155: - Summary: dueling httpRootDirs Key: DISPATCH-1155 URL: https://issues.apache.org/jira/browse/DISPATCH-1155 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish Assignee: michael goulish New version of qpid-dispatch-router uses "/usr/share/qpid-dispatch/console/stand-alone" as the default httpRootDir. But when installing new qpid-dispatch-console package, the pages are available at "/usr/share/qpid-dispatch/console". This forces the user to define httpRootDir on the listener to bypass this issue. Ted suggests this fix: Remove the default behavior for httpRootDir. If it is not specified in the configuration for a listener, then HTTP requests shall be rejected on connections to that listener. Such a listener would only be usable for AMQP over websockets. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-959) Rate limiting policy
[ https://issues.apache.org/jira/browse/DISPATCH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655668#comment-16655668 ] michael goulish commented on DISPATCH-959: -- This is not a bug, it's a new feature. > Rate limiting policy > > > Key: DISPATCH-959 > URL: https://issues.apache.org/jira/browse/DISPATCH-959 > Project: Qpid Dispatch > Issue Type: Bug > Components: Policy Engine, Routing Engine >Affects Versions: 1.0.1 >Reporter: Chuck Rolke >Priority: Major > Fix For: Backlog > > > Router administrators would like rate-limiting policies to allow different > classes of users. A network-rate limit similar to how home cable networks are > provisioned for bandwidth is a classic model and is being considered as the > first choice. > A message-per-second limit might be easier to enforce. But a single user > message may have a large data section, or have a small data section but have > huge message annotations. Thus a user might consume a lot of network > bandwidth with only a few messages. > It is still unclear at what level the rate limiting should be applied. > Choices are: > * Per vhost > * Per vhost connection > * Per vhost user -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (DISPATCH-1139) support prioritized addresses
[ https://issues.apache.org/jira/browse/DISPATCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed DISPATCH-1139. - Resolution: Implemented > support prioritized addresses > - > > Key: DISPATCH-1139 > URL: https://issues.apache.org/jira/browse/DISPATCH-1139 > Project: Qpid Dispatch > Issue Type: New Feature > Components: Router Node, Routing Engine, Tests >Reporter: michael goulish >Assignee: michael goulish >Priority: Major > > Support a new field in the address descriptor in router configuration files > that will assign a priority to the address. > Any message that does not have an intrinsic priority already assigned will > inherit the priority of the address to which it is sent. If no priority is > explicitly assigned to an address, then it will be assigned the default > priority. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-1140) tests for message priority
[ https://issues.apache.org/jira/browse/DISPATCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved DISPATCH-1140. --- Resolution: Duplicate Sorry – I should have just included this with DISPATCH-1139. When I PR that one, it will have a test that looks at both message and address priority. > tests for message priority > -- > > Key: DISPATCH-1140 > URL: https://issues.apache.org/jira/browse/DISPATCH-1140 > Project: Qpid Dispatch > Issue Type: New Feature >Reporter: michael goulish >Assignee: michael goulish >Priority: Major > > The message priority code recently checked in ( in DISPATCH-1096 ) should > have at least the following two tests: > > # Make a two-router network, A and B. Send messages from A to B, confirm > that they arrive, then kill and restart B and send and confirm more messages. > Do this test once with B connecting to A, and once with A connecting to B. > # Two-router network again. Send some messages from A to B (i.e. sender > attached to A, rcvr to B) – sending at least one message of each priority. > ( 0 - 9, inclusive ). Send management commands to A to see how many outgoing > inter-router links had message traffic go over them. The number should be 10. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1140) tests for message priority
michael goulish created DISPATCH-1140: - Summary: tests for message priority Key: DISPATCH-1140 URL: https://issues.apache.org/jira/browse/DISPATCH-1140 Project: Qpid Dispatch Issue Type: New Feature Reporter: michael goulish Assignee: michael goulish The message priority code recently checked in ( in DISPATCH-1096 ) should have at least the following two tests: # Make a two-router network, A and B. Send messages from A to B, confirm that they arrive, then kill and restart B and send and confirm more messages. Do this test once with B connecting to A, and once with A connecting to B. # Two-router network again. Send some messages from A to B (i.e. sender attached to A, rcvr to B) – sending at least one message of each priority. ( 0 - 9, inclusive ). Send management commands to A to see how many outgoing inter-router links had message traffic go over them. The number should be 10. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-1096) support AMQP prioritized messages
[ https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved DISPATCH-1096. --- Resolution: Implemented I will open a separate Jira for tests that this code needs. > support AMQP prioritized messages > - > > Key: DISPATCH-1096 > URL: https://issues.apache.org/jira/browse/DISPATCH-1096 > Project: Qpid Dispatch > Issue Type: New Feature >Reporter: michael goulish >Assignee: michael goulish >Priority: Major > Fix For: 1.4.0 > > > Detect priority info from message header in the router code. > Create separate inter-router links for the various priorities. > Per connection (i.e. not globally across the router) service high-priority > inter-router links before low priority links. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (PROTON-1949) no message header if priority == default
[ https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-1949. --- Resolution: Not A Problem We have found a nice workaround for this–probably better, actually--and do not need proton to change anything. > no message header if priority == default > > > Key: PROTON-1949 > URL: https://issues.apache.org/jira/browse/PROTON-1949 > Project: Qpid Proton > Issue Type: Bug >Reporter: michael goulish >Priority: Major > > Proton does not send a message header if there would be nothing in it but the > priority field, and if the priority was set to the default value (4). > At the router level, we are allowing the user to set priorities on addresses. > Those priorities will be given to any message sent to that address if the > message otherwise had no priority set. > So - we need to be able to distinguish between messages that were assigned > the default priority, and messages in which the priority was left undefined. > We would like proton to send the priority field in the message header if the > user sets any priority. Then we will be able to interpret no header, or no > priority field in the header as "no priority was assigned". > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1949) no message header if priority == default
[ https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640115#comment-16640115 ] michael goulish commented on PROTON-1949: - Nolo contendere. We have decided that it is better to give precedence to the address's priority, which means that we do not need an ability in the message to express _no value_. I will close this as not-a-bug. > no message header if priority == default > > > Key: PROTON-1949 > URL: https://issues.apache.org/jira/browse/PROTON-1949 > Project: Qpid Proton > Issue Type: Bug >Reporter: michael goulish >Priority: Major > > Proton does not send a message header if there would be nothing in it but the > priority field, and if the priority was set to the default value (4). > At the router level, we are allowing the user to set priorities on addresses. > Those priorities will be given to any message sent to that address if the > message otherwise had no priority set. > So - we need to be able to distinguish between messages that were assigned > the default priority, and messages in which the priority was left undefined. > We would like proton to send the priority field in the message header if the > user sets any priority. Then we will be able to interpret no header, or no > priority field in the header as "no priority was assigned". > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1139) support prioritized addresses
michael goulish created DISPATCH-1139: - Summary: support prioritized addresses Key: DISPATCH-1139 URL: https://issues.apache.org/jira/browse/DISPATCH-1139 Project: Qpid Dispatch Issue Type: New Feature Components: Router Node, Routing Engine, Tests Reporter: michael goulish Assignee: michael goulish Support a new field in the address descriptor in router configuration files that will assign a priority to the address. Any message that does not have an intrinsic priority already assigned will inherit the priority of the address to which it is sent. If no priority is explicitly assigned to an address, then it will be assigned the default priority. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.
[ https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638896#comment-16638896 ] michael goulish commented on DISPATCH-1126: --- pending fix for this in PR 384 > ERROR Attempt to attach too many inter-router links for priority sheaf. > --- > > Key: DISPATCH-1126 > URL: https://issues.apache.org/jira/browse/DISPATCH-1126 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.3.0 > Environment: Fedora 28 > * Three router network in linear arrangement A - B - C. > * B has a listener; A and C connect to it > >Reporter: Chuck Rolke >Assignee: michael goulish >Priority: Major > Attachments: taj-GRN.log > > > Some state probably not cleaned up when router connections are lost. 10 > messages > (error) Attempt to attach too many inter-router links for priority sheaf. > appear when routers reconnect. > Start the network. Then kill routers A and C and restart them. Router B > prints the messages. > Log file attached -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-1949) no message header if priority == default
michael goulish created PROTON-1949: --- Summary: no message header if priority == default Key: PROTON-1949 URL: https://issues.apache.org/jira/browse/PROTON-1949 Project: Qpid Proton Issue Type: Bug Reporter: michael goulish Proton does not send a message header if there would be nothing in it but the priority field, and if the priority was set to the default value (4). At the router level, we are allowing the user to set priorities on addresses. Those priorities will be given to any message sent to that address if the message otherwise had no priority set. So - we need to be able to distinguish between messages that were assigned the default priority, and messages in which the priority was left undefined. We would like proton to send the priority field in the message header if the user sets any priority. Then we will be able to interpret no header, or no priority field in the header as "no priority was assigned". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1135) Router A leaks memory when router B killed and restarted.
michael goulish created DISPATCH-1135: - Summary: Router A leaks memory when router B killed and restarted. Key: DISPATCH-1135 URL: https://issues.apache.org/jira/browse/DISPATCH-1135 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish I set up a 2-node router network, with B connecting to A. No clients. Repeatedly killing and restarting B – giving 3 seconds after each kill and after each restart for the network to settle down. Repeated 100 times. The same router A ran for the duration of the test. The 'ps' program, run repeatedly on router A, indicated that it was leaking about 82 KB per kill-and-restart. Using 'qdstat m' on A after each kill-and-restart showed the following difference between iteration 1 and iteration 100. ( Note, this shows growth of only 44 KB per iteration. ) As far as I looked into the past (about 1 year) I saw similar behavior. In the chart below, the first column "size" is the number of bytes in a single struct of that type. "In-threads" means how many of each struct are currently being used. Note that, although there are no clients, the routers will be sending some messages to each other. {{type size in-threads in-threads item byte}} {{ test 1 test 100 growth growth}} {{ ==}} {{qd_buffer_t 536 256 2944 2688 1440768}} {{ qd_message_content_t 1056 128 1216 1088 1148928}} {{ qd_iterator_t 160 448 7488 7040 1126400}} {{ qd_parsed_field_t 88 256 2880 2624 230912}} {{ qdr_delivery_t 248 256 1152 896 08}} {{ qd_message_t 160 256 1088 832 133120}} {{ qd_connection_t 2320 32 64 32 74240}} {{ qdr_general_work_t 64 64 448 384 24576}} {{ qdr_link_t 360 192 256 64 23040}} {{ qd_bitmask_t 24 192 1088 896 21504}} {{ qdr_connection_work_t 48 64 384 320 15360}} {{ qdr_link_work_t 48 64 384 320 15360}} {{ qd_link_t 96 128 256 128 12288}} {{ qdr_link_ref_t 24 64 448 384 9216}} {{ qd_parsed_turbo_t 64 128 256 128 8192}} {{ qd_link_ref_t 24 64 256 192 4608}} {{ qdr_error_t 24 64 256 192 4608}} {{ qd_deferred_call_t 32 64 192 128 4096}} {{ qdr_terminus_t 64 192 256 64 4096}} {{ qdr_delivery_ref_t 24 64 128 64 1536}} ( All other structs have zero growth. (Or, in one case, less.) ) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Assigned] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.
[ https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-1126: - Assignee: michael goulish > ERROR Attempt to attach too many inter-router links for priority sheaf. > --- > > Key: DISPATCH-1126 > URL: https://issues.apache.org/jira/browse/DISPATCH-1126 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 1.3.0 > Environment: Fedora 28 > * Three router network in linear arrangement A - B - C. > * B has a listener; A and C connect to it > >Reporter: Chuck Rolke >Assignee: michael goulish >Priority: Major > Attachments: taj-GRN.log > > > Some state probably not cleaned up when router connections are lost. 10 > messages > (error) Attempt to attach too many inter-router links for priority sheaf. > appear when routers reconnect. > Start the network. Then kill routers A and C and restart them. Router B > prints the messages. > Log file attached -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-1096) support AMQP prioritized messages
[ https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621117#comment-16621117 ] michael goulish commented on DISPATCH-1096: --- The priority code should make messages default to priority 4 when there is no priority in the header, or no header at all in the message. The proton library leaves out the message header (well, makes it an empty list) if there would otherwise be nothing but a default priority value in there. > support AMQP prioritized messages > - > > Key: DISPATCH-1096 > URL: https://issues.apache.org/jira/browse/DISPATCH-1096 > Project: Qpid Dispatch > Issue Type: New Feature >Reporter: michael goulish >Assignee: michael goulish >Priority: Major > Fix For: 1.4.0 > > > Detect priority info from message header in the router code. > Create separate inter-router links for the various priorities. > Per connection (i.e. not globally across the router) service high-priority > inter-router links before low priority links. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-1096) support AMQP prioritized messages
michael goulish created DISPATCH-1096: - Summary: support AMQP prioritized messages Key: DISPATCH-1096 URL: https://issues.apache.org/jira/browse/DISPATCH-1096 Project: Qpid Dispatch Issue Type: New Feature Reporter: michael goulish Assignee: michael goulish Detect priority info from message header in the router code. Create separate inter-router links for the various priorities. Per connection (i.e. not globally across the router) service high-priority inter-router links before low priority links. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-873) new routes calculated wrongly after connector deletion
michael goulish created DISPATCH-873: Summary: new routes calculated wrongly after connector deletion Key: DISPATCH-873 URL: https://issues.apache.org/jira/browse/DISPATCH-873 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 1.0.0 Reporter: michael goulish Priority: Blocker Fix For: 1.0.0 I have a 3-mesh network with nodes A, B, C. B-->A cost is 10 C-->A cost is 10 B-->C cost is 100. Initial route from B to C is calculated correctly as B,A,C : cost == 20. But after I used qdmanage to delete the connector from B to A, I get no further messages delivered from B to C. Using qdstat to look at routing table, it looks wrong: Both B and C think they can only get to each other by going through A. But there is now no route that way, because B-->A has been deleted. They should be using the direct connection B-->C. Yet they both calculate the cost correctly as 100. === A === Routers in the Network router-id next-hop link ver cost neighbors valid-origins A (self)- 1 ['C'] [] B C - 1110 ['A', 'C'] [] C - 1 110['A', 'B'] ['B'] === B === Routers in the Network router-id next-hop link ver cost neighbors valid-origins B (self)- 1 ['C'] [] C A - 1100 [] [] === C === Routers in the Network router-id next-hop link ver cost neighbors valid-origins A - 0 110['C'] [] B A - 1100 ['A', 'C'] ['A'] C (self)- 1 ['A', 'B'] [] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-870) connection improperly reopened from closed connector
michael goulish created DISPATCH-870: Summary: connection improperly reopened from closed connector Key: DISPATCH-870 URL: https://issues.apache.org/jira/browse/DISPATCH-870 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 1.0.0 Reporter: michael goulish Priority: Major I have a 3-mesh router network, ABC, and I am sending messages from B to C. The route being used is B,A,C -- because I have configured it to be cheaper than B,C . I use the management interface to kill the connector from C to A. For the next two seconds my messages are released. I use another management call to confirm that the connector has really been removed. ( I also see it happening in the C code, at fn qd_connection_manager_delete_connector() . ) What We Expect: the network should re-route to start sending these messages on the route B,C -- because that is now the only route available. What We Observe: after 2 seconds, the function try_open_lh() is called. It reopens the connection from C to A even though the connector has been removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (PROTON-1408) long-lived connections suffer large performance hit after many messages
[ https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed PROTON-1408. --- Resolution: Fixed Fix Version/s: 0.18.0 Fixed with checkin d22f124b0534983f6557850e48f13317ec6df0e5 > long-lived connections suffer large performance hit after many messages > --- > > Key: PROTON-1408 > URL: https://issues.apache.org/jira/browse/PROTON-1408 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Assignee: michael goulish > Fix For: 0.18.0 > > Attachments: jira_proton_1408_reproducer.tar.gz > > > In long-running soak tests, in which connections are never taken down, I am > seeing a sudden & severe performance degradation when the number of messages > over the connection reaches about 6.4 billion. > This is happening in tests with two senders, two receivers & one router > intermediating. > I have tried C libUV clients as well as CPP clients. Behavior is not > identical, but I see sudden performance drop, ie. 8x throughput decrease or > worse, in both cases. > Alan / Ted / Ken see an issue in use of improper comparison logic in > pn_do_disposition(), in transport.c . I am trying to prove this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (PROTON-1408) long-lived connections suffer large performance hit after many messages
[ https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated PROTON-1408: Attachment: jira_proton_1408_reproducer.tar.gz Everything you need in a tidy little package. I have 10 out of 10 reproductions with this. > long-lived connections suffer large performance hit after many messages > --- > > Key: PROTON-1408 > URL: https://issues.apache.org/jira/browse/PROTON-1408 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Assignee: Alan Conway > Attachments: jira_proton_1408_reproducer.tar.gz > > > In long-running soak tests, in which connections are never taken down, I am > seeing a sudden & severe performance degradation when the number of messages > over the connection reaches about 6.4 billion. > This is happening in tests with two senders, two receivers & one router > intermediating. > I have tried C libUV clients as well as CPP clients. Behavior is not > identical, but I see sudden performance drop, ie. 8x throughput decrease or > worse, in both cases. > Alan / Ted / Ken see an issue in use of improper comparison logic in > pn_do_disposition(), in transport.c . I am trying to prove this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages
[ https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926821#comment-15926821 ] michael goulish commented on PROTON-1408: - I can now reproduce the problem 100%, and after just a couple minutes instead of 9 hours or 27 hours as it was initially. This is done by: 1. storing deliveries in the receiver and only acking when I get 100,000 2. Altering proton code so that the first outgoing ID it uses is already close to 2^31 - 1 I am now packaging up all my stuff for the reproducer. > long-lived connections suffer large performance hit after many messages > --- > > Key: PROTON-1408 > URL: https://issues.apache.org/jira/browse/PROTON-1408 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Assignee: Alan Conway > > In long-running soak tests, in which connections are never taken down, I am > seeing a sudden & severe performance degradation when the number of messages > over the connection reaches about 6.4 billion. > This is happening in tests with two senders, two receivers & one router > intermediating. > I have tried C libUV clients as well as CPP clients. Behavior is not > identical, but I see sudden performance drop, ie. 8x throughput decrease or > worse, in both cases. > Alan / Ted / Ken see an issue in use of improper comparison logic in > pn_do_disposition(), in transport.c . I am trying to prove this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages
[ https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890860#comment-15890860 ] michael goulish commented on PROTON-1408: - Using proton and dispatch code from 17 Feb 2017, I am running 5 simultaneous tests on a large machine, each with 1 router, 2 senders, 2 receivers. So far I have no reproduction of the slow-down. All the senders have gone beyond 8 billion messages with no slowdown at all. OS is RHEL 7.2 . > long-lived connections suffer large performance hit after many messages > --- > > Key: PROTON-1408 > URL: https://issues.apache.org/jira/browse/PROTON-1408 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Reporter: michael goulish >Assignee: Alan Conway > > In long-running soak tests, in which connections are never taken down, I am > seeing a sudden & severe performance degradation when the number of messages > over the connection reaches about 6.4 billion. > This is happening in tests with two senders, two receivers & one router > intermediating. > I have tried C libUV clients as well as CPP clients. Behavior is not > identical, but I see sudden performance drop, ie. 8x throughput decrease or > worse, in both cases. > Alan / Ted / Ken see an issue in use of improper comparison logic in > pn_do_disposition(), in transport.c . I am trying to prove this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (PROTON-1408) long-lived connections suffer large performance hit after many messages
michael goulish created PROTON-1408: --- Summary: long-lived connections suffer large performance hit after many messages Key: PROTON-1408 URL: https://issues.apache.org/jira/browse/PROTON-1408 Project: Qpid Proton Issue Type: Bug Components: proton-c Reporter: michael goulish In long-running soak tests, in which connections are never taken down, I am seeing a sudden & severe performance degradation when the number of messages over the connection reaches about 6.4 billion. This is happening in tests with two senders, two receivers & one router intermediating. I have tried C libUV clients as well as CPP clients. Behavior is not identical, but I see sudden performance drop, ie. 8x throughput decrease or worse, in both cases. Alan / Ted / Ken see an issue in use of improper comparison logic in pn_do_disposition(), in transport.c . I am trying to prove this now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-372) qdstat should have a timeout command line argument
michael goulish created DISPATCH-372: Summary: qdstat should have a timeout command line argument Key: DISPATCH-372 URL: https://issues.apache.org/jira/browse/DISPATCH-372 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish qdstat should have a timeout command line argument. but -- it doesn't. sometimes when the router is busy, it is helpful to allow a longer timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage
[ https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321466#comment-15321466 ] michael goulish commented on DISPATCH-369: -- ...and without anything interesting showing up in the output from 'qdstat -m'. > investigate excursions in memory usage > -- > > Key: DISPATCH-369 > URL: https://issues.apache.org/jira/browse/DISPATCH-369 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 0.6.0 >Reporter: michael goulish >Assignee: michael goulish > Attachments: n_senders_vs_MEM_three_trials.jpg > > > I don't know if this is a bug or not. I'm Jirifying it as a way of > remembering an interesting behavior that my testing has shown, so that I can > continue developing the testing and come back to this later. > ... > While measuring router memory usage under varying message rate and number of > senders -- when I run the same test multiple times, I am occasionally (about > 1 in 4 times or so) seeing a test in which memory usage is much higher than > the others. > For example: > In this test: > { > straight-through topology ( 1 sender --> 1 address --> 1 receiver ) > 200 senders > 200 messages per second > 100 bytes per message > } > I record router memory usage at the point when all receivers are just hitting > 10,000 messages. (This is because it grows -- see previous JIRA.) > In three iterations I get the following memory usage: >66 MB >63 MB > 181 MB > Something similar, but less drastic, happened occasionally at lower levels in > the test. > In this case, this is a tripling of memory usage for the same scenario. I > doubt that this is the result of slightly different timing in a block > allocation of data structures. What just happened? > Start by investigating with "qdstat -m" and see if that shows some or all of > the difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage
[ https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321448#comment-15321448 ] michael goulish commented on DISPATCH-369: -- I rebuilt dispatch without the memory pooling feature, expecting that this would make the memory blow-ups go away. It did not! On the 7th run of my test, I saw memory go from 60 MB (Resident Set Size) to 480 MB between one printout of 'top' and the next. (3 seconds) -- same behavior I was seeing with memory pooling enabled. > investigate excursions in memory usage > -- > > Key: DISPATCH-369 > URL: https://issues.apache.org/jira/browse/DISPATCH-369 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 0.6.0 >Reporter: michael goulish >Assignee: michael goulish > Attachments: n_senders_vs_MEM_three_trials.jpg > > > I don't know if this is a bug or not. I'm Jirifying it as a way of > remembering an interesting behavior that my testing has shown, so that I can > continue developing the testing and come back to this later. > ... > While measuring router memory usage under varying message rate and number of > senders -- when I run the same test multiple times, I am occasionally (about > 1 in 4 times or so) seeing a test in which memory usage is much higher than > the others. > For example: > In this test: > { > straight-through topology ( 1 sender --> 1 address --> 1 receiver ) > 200 senders > 200 messages per second > 100 bytes per message > } > I record router memory usage at the point when all receivers are just hitting > 10,000 messages. (This is because it grows -- see previous JIRA.) > In three iterations I get the following memory usage: >66 MB >63 MB > 181 MB > Something similar, but less drastic, happened occasionally at lower levels in > the test. > In this case, this is a tripling of memory usage for the same scenario. I > doubt that this is the result of slightly different timing in a block > allocation of data structures. What just happened? > Start by investigating with "qdstat -m" and see if that shows some or all of > the difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-369) investigate excursions in memory usage
[ https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated DISPATCH-369: - Attachment: n_senders_vs_MEM_three_trials.jpg Results of repeating each test three times, showing occasional excursions in memory usage. > investigate excursions in memory usage > -- > > Key: DISPATCH-369 > URL: https://issues.apache.org/jira/browse/DISPATCH-369 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 0.6.0 >Reporter: michael goulish >Assignee: michael goulish > Attachments: n_senders_vs_MEM_three_trials.jpg > > > I don't know if this is a bug or not. I'm Jirifying it as a way of > remembering an interesting behavior that my testing has shown, so that I can > continue developing the testing and come back to this later. > ... > While measuring router memory usage under varying message rate and number of > senders -- when I run the same test multiple times, I am occasionally (about > 1 in 4 times or so) seeing a test in which memory usage is much higher than > the others. > For example: > In this test: > { > straight-through topology ( 1 sender --> 1 address --> 1 receiver ) > 200 senders > 200 messages per second > 100 bytes per message > } > I record router memory usage at the point when all receivers are just hitting > 10,000 messages. (This is because it grows -- see previous JIRA.) > In three iterations I get the following memory usage: >66 MB >63 MB > 181 MB > Something similar, but less drastic, happened occasionally at lower levels in > the test. > In this case, this is a tripling of memory usage for the same scenario. I > doubt that this is the result of slightly different timing in a block > allocation of data structures. What just happened? > Start by investigating with "qdstat -m" and see if that shows some or all of > the difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-369) investigate excursions in memory usage
michael goulish created DISPATCH-369: Summary: investigate excursions in memory usage Key: DISPATCH-369 URL: https://issues.apache.org/jira/browse/DISPATCH-369 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.6.0 Reporter: michael goulish Assignee: michael goulish I don't know if this is a bug or not. I'm Jirifying it as a way of remembering an interesting behavior that my testing has shown, so that I can continue developing the testing and come back to this later. ... While measuring router memory usage under varying message rate and number of senders -- when I run the same test multiple times, I am occasionally (about 1 in 4 times or so) seeing a test in which memory usage is much higher than the others. For example: In this test: { straight-through topology ( 1 sender --> 1 address --> 1 receiver ) 200 senders 200 messages per second 100 bytes per message } I record router memory usage at the point when all receivers are just hitting 10,000 messages. (This is because it grows -- see previous JIRA.) In three iterations I get the following memory usage: 66 MB 63 MB 181 MB Something similar, but less drastic, happened occasionally at lower levels in the test. In this case, this is a tripling of memory usage for the same scenario. I doubt that this is the result of slightly different timing in a block allocation of data structures. What just happened? Start by investigating with "qdstat -m" and see if that shows some or all of the difference. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-344) memory growth after repeated calls from qdstat -m
michael goulish created DISPATCH-344: Summary: memory growth after repeated calls from qdstat -m Key: DISPATCH-344 URL: https://issues.apache.org/jira/browse/DISPATCH-344 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Reporter: michael goulish 0. version of dispatch code is 0.6.0 RC3 1. bring up a router 2. do not attach any clients, except... 3. ...repeatedly invoke qdstat -m on the router result: After 1000 calls from "qdstat -m", top shows that router memory has grown by 4947968 bytes. The output from "qdstat -m" accounts for about 63% of that, or 318 bytes. Here are the data types that increased, according to qdstat, ordered from largest to smallest. Um. This table looked really nice when it was in a fixed-width font. type size total total increase increase beforeafter structs bytes = qd_log_entry_t 2104112 1040 928 1952512 qd_buffer_t536 80 11201040 557440 qd_field_iterator_t128 192 12801088 139264 qdr_delivery_t 136 64 512 448 60928 qdr_connection_t 216 64 320 256 55296 qdr_field_t40 192 12801088 43520 qd_connection_t224 64 256 192 43008 qd_message_content_t 640 1680 64 40960 qd_message_t 128 192 512 320 40960 qdpn_connector_t 600 1664 48 28800 qdr_general_work_t 64 64 512 448 28672 qdr_connection_work_t 56 64 512 448 25088 qd_composite_t 112 64 256 192 21504 qdr_link_t 264 1680 64 16896 qd_composed_field_t64 64 256 192 12288 qdr_terminus_t 64 64 256 192 12288 qdr_delivery_ref_t 24 64 512 448 10752 qdr_link_ref_t 24 64 512 448 10752 qd_parsed_field_t 80 128 256 128 10240 qdr_action_t 160 256 320 64 10240 qd_link_t 48 64 256 1929216 qdr_error_t240 320 3207680 qd_deferred_call_t 32 64 256 1926144 grand total increase from qdstat:318 grand total increase from top: 4947968 Here is the script I used This input window is breaking some lines. >:-( #! /bin/bash echo "NOTE: router should already be running." INSTALL_ROOT=${SHACKLETON_ROOT}/install PROTON_INSTALL_DIR=${INSTALL_ROOT}/proton DISPATCH_INSTALL_DIR=${INSTALL_ROOT}/dispatch QDSTAT=${DISPATCH_INSTALL_DIR}/bin/qdstat export LD_LIBRARY_PATH=${DISPATCH_INSTALL_DIR}/lib64:${PROTON_INSTALL_DIR}/lib64 export PYTHONPATH=${DISPATCH_INSTALL_DIR}/lib/qpid-dispatch/python:${DISPATCH_INSTALL_DIR}/lib/python2.7/site-packages:${PROTON_INSTALL_DIR}/lib64/proton/bindings/python ROUTER_PID=`ps -aef | grep qdrouterd | grep -v grep | awk '{print $2}'` count=1 while [ $count -lt 1001 ] do echo "===" echo "TEST $count" echo "===" count=$(( $count + 1 )) top -b -n 1 -p ${ROUTER_PID} ${QDSTAT} -m sleep 3 done -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.
[ https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264603#comment-15264603 ] michael goulish commented on PROTON-992: Dispatch is not yet immune to this issue. Also, I think Proton needs to let the application handle initialization and shutdown of Cyrus SASL. I made a test that brings up a 6-router network, and randomly kills and restarts routers. I get a router core, usually within 5 iterations, because of this issue. Here is how I fixed it: 1. Let dispatch code call sasl_client_init() and sasl_server_init() at the top of qd_server_run(). And remove these calls from Proton. In keeping these calls to itself, Proton cannot prevent two threads from simultaneously getting into sasl_*_init(). SegV City. 2. Prevent proton from calling sasl_{client,server}_done(), in pni_sasl_impl_free(). Being thread-agnostic, Proton cannot possibly know when it's safe to dispose of the sasl object, which is being used by many threads. Both of those Cyrus calls affect global state by NULLing out a global pointer that stores the mechanisms string. With these changes, my test has now run to 400 iterations with no crash. > Proton's use of Cyrus SASL is not thread-safe. > -- > > Key: PROTON-992 > URL: https://issues.apache.org/jira/browse/PROTON-992 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c >Affects Versions: 0.10 >Reporter: michael goulish >Assignee: Andrew Stitcher >Priority: Critical > > Documentation for the Cyrus SASL library says that the library is believed to > be thread-safe only if the code that uses it meets several requirements. > The requirements are: > * you supply mutex functions (see sasl_set_mutex()) > * you make no libsasl calls until sasl_client/server_init() completes > * no libsasl calls are made after sasl_done() is begun > * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library. > It says explicitly that that sasl_set* calls are not thread safe, since they > set global state. > The proton library makes calls to sasl_set* functions in : > pni_init_client() > pni_init_server(), and > pni_process_init() > Since those are internal functions, there is no way for code that uses Proton > to lock around those calls. > I think proton needs a new API call to let applications call > sasl_set_mutex(). Or something. > We probably also need other protections to meet the other requirements > specified in the Cyrus documentation (and quoted above). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-296) segfault on router startup
[ https://issues.apache.org/jira/browse/DISPATCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260774#comment-15260774 ] michael goulish commented on DISPATCH-296: -- I have also seen this crash, with same frequency Gordon is describing. In my case, I have a network of 6 routers. I repeatedly kill one and replace it. After a few such kills and restarts, I see this crash. After instrumenting the Cyrus SASL code, I see a bad situation just before the crash: two threads from same process both inside the Cyrus fn sasl_client_init() within a few microseconds of each other. The Cyrus SASL code for the fn sasl_client_init() has a little logic to try and protect against multiple calls to the function -- but it will not work in a multi-threaded environment except by luck. MDEBUG proton called sasl_client_init. PID 28668 TID 7f1ac85a01c0 TIME 1461781160.774368 <- different threads in same fn 7 usec apart MDEBUG proton called sasl_client_init. PID 28668 TID 7f1abaca1700 TIME 1461781160.774375 <- just before crash in sasl_dispose MDEBUG proton calling sasl_dispose. PID 28668 TID 7f1ac85a01c0 TIME 1461781160.77 MDEBUG proton calling sasl_dispose. PID 28668 TID 7f1abaca1700 TIME 1461781160.774532 > segfault on router startup > -- > > Key: DISPATCH-296 > URL: https://issues.apache.org/jira/browse/DISPATCH-296 > Project: Qpid Dispatch > Issue Type: Bug > Components: Container >Affects Versions: 0.6 >Reporter: Gordon Sim > Attachments: multiconnect.conf > > > Starting up a router with a couple of connectors (connectingto qpidd > instances in my case), the router occasionally (maybe one in five) crashes > with a segfault. > {noformat} > (gdb) bt > #0 0x7629c76e in sasl_client_add_plugin () from /lib64/libsasl2.so.3 > #1 0x7629cf58 in sasl_client_init () from /lib64/libsasl2.so.3 > #2 0x7796ecff in pni_init_client > (transport=transport@entry=0x7fffdc008fc0) at > /home/gordon/projects/proton/proton-c/src/sasl/cyrus_sasl.c:115 > #3 0x7796e87e in pn_do_mechanisms (transport=0x7fffdc008fc0, > frame_type=, channel=, args=, > payload=) > at /home/gordon/projects/proton/proton-c/src/sasl/sasl.c:703 > #4 0x77959b26 in pni_dispatch_action (payload=0x7fffe96f2360, > args=0x7fffdc0091c0, channel=0, frame_type=1 '\001', lcode=, > transport=0x7fffdc008fc0) > at /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:74 > #5 pni_dispatch_frame (args=0x7fffdc0091c0, transport=0x7fffdc008fc0, > frame=...) at > /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:116 > #6 pn_dispatcher_input (transport=0x7fffdc008fc0, bytes=0x7fffdc00f358 "", > available=0, batch=false, halt=0x7fffdc009144) at > /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:135 > #7 0x7795fbba in transport_consume > (transport=transport@entry=0x7fffdc008fc0) at > /home/gordon/projects/proton/proton-c/src/transport/transport.c:1751 > #8 0x779630d2 in pn_transport_process > (transport=transport@entry=0x7fffdc008fc0, size=) at > /home/gordon/projects/proton/proton-c/src/transport/transport.c:2860 > #9 0x77bb08e3 in qdpn_connector_process (c=0x7fffdc0068c0) at > /home/gordon/projects/dispatch/src/posix/driver.c:761 > #10 0x77bc3a91 in process_connector (cxtr=0x7fffdc0068c0, > qd_server=0x702b50) at /home/gordon/projects/dispatch/src/server.c:683 > #11 thread_run (arg=0x87b9b0) at > /home/gordon/projects/dispatch/src/server.c:958 > #12 0x7772660a in start_thread () from /lib64/libpthread.so.0 > #13 0x76c8ba4d in clone () from /lib64/libc.so.6 > {noformat} > other threads: > {noformat} > (gdb) thread 1 > [Switching to thread 1 (Thread 0x77fd1180 (LWP 19319))] > #0 0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0 > (gdb) bt > #0 0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x777289cd in pthread_mutex_lock () from /lib64/libpthread.so.0 > #2 0x77bb1239 in sys_mutex_lock (mutex=0x702da0) at > /home/gordon/projects/dispatch/src/posix/threading.c:70 > #3 0x77bc4723 in qd_timer (qd=qd@entry=0x604240, > cb=cb@entry=0x77bc11b0 , context=context@entry=0x702b50) at > /home/gordon/projects/dispatch/src/timer.c:89 > #4 0x77bc3f33 in qd_server_run (qd=0x604240) at > /home/gordon/projects/dispatch/src/server.c:1349 > #5 0x00401ac7 in main_process > (config_path=config_path@entry=0x7fffe090 > "./etc/qpid-dispatch/multiconnect.conf", > python_pkgdir=python_pkgdir@entry=0x402468 > "/home/gordon/projects/dispatch/installs/master/lib/qpid-dispatch/python", > fd=fd@entry=2) at /home/gordon/projects/dispatch/router/src/main.c:135 > #6 0x004017b7 in main (argc=3,
[jira] [Created] (DISPATCH-210) try an epoll-based driver ...
michael goulish created DISPATCH-210: Summary: try an epoll-based driver ... Key: DISPATCH-210 URL: https://issues.apache.org/jira/browse/DISPATCH-210 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish ...to improve scalability to large numbers of attached messaging apps. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-157) add sasl tests to dispatch unit tests
michael goulish created DISPATCH-157: Summary: add sasl tests to dispatch unit tests Key: DISPATCH-157 URL: https://issues.apache.org/jira/browse/DISPATCH-157 Project: Qpid Dispatch Issue Type: Improvement Components: Tests Affects Versions: 0.5 Reporter: michael goulish Assignee: michael goulish Fix For: 0.5 Add a complete set of sasl tests to the Dispatch unit test framework. ensure correct behavior for cross-product of authenticatePeer := { no, yes, insecureOk } x saslMechanisms:= { NONE, PLAIN, DIGEST-MD5, CRAM-MD5, GSSAPI, SRP } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-139) adapt to proton changes to avoid crashes under high session count
michael goulish created DISPATCH-139: Summary: adapt to proton changes to avoid crashes under high session count Key: DISPATCH-139 URL: https://issues.apache.org/jira/browse/DISPATCH-139 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish With high session count some ( i.e. 2^15 ) I implemented some changes in proton library code to avoid crashing in the library. ( that was PROTON-864 ) But that means that the library code will sometimes return 0 rather than crashing. Alter dispatch code to Do the Right Thing in case of these new null return values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Assigned] (DISPATCH-139) adapt to proton changes to avoid crashes under high session count
[ https://issues.apache.org/jira/browse/DISPATCH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish reassigned DISPATCH-139: Assignee: michael goulish adapt to proton changes to avoid crashes under high session count - Key: DISPATCH-139 URL: https://issues.apache.org/jira/browse/DISPATCH-139 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish Assignee: michael goulish With high session count some ( i.e. 2^15 ) I implemented some changes in proton library code to avoid crashing in the library. ( that was PROTON-864 ) But that means that the library code will sometimes return 0 rather than crashing. Alter dispatch code to Do the Right Thing in case of these new null return values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-140) adapt to proton changes for large number of link-handles
michael goulish created DISPATCH-140: Summary: adapt to proton changes for large number of link-handles Key: DISPATCH-140 URL: https://issues.apache.org/jira/browse/DISPATCH-140 Project: Qpid Dispatch Issue Type: Improvement Reporter: michael goulish Assignee: michael goulish For PROTON-886 I will be changing proton library code to honor handle-max when large numbers of links are created. There will probably be instances of proton library fns returning null. Make Dispatch changes to account for the proton changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-117) SEG Fault when outgoing SSL connections fail
[ https://issues.apache.org/jira/browse/DISPATCH-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334815#comment-14334815 ] michael goulish commented on DISPATCH-117: -- This checkin also fixed a rare crash I was seeing in my 'topologist' testing (killing and restarting routers) -- which happened even when SSL was not involved. SEG Fault when outgoing SSL connections fail Key: DISPATCH-117 URL: https://issues.apache.org/jira/browse/DISPATCH-117 Project: Qpid Dispatch Issue Type: Bug Components: Container Affects Versions: 0.3 Reporter: Ted Ross Assignee: Ted Ross Priority: Critical Fix For: 0.4 Hat tip: Ken Giusti for isolating this bug When using SSL for outgoing connectors, a crash may occur when the connection fails. There is a race condition whereby a second thread can interfere with an outgoing connector before the cxtr_try_open function has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-113) expose NodeTracker::last_topology_change in management
michael goulish created DISPATCH-113: Summary: expose NodeTracker::last_topology_change in management Key: DISPATCH-113 URL: https://issues.apache.org/jira/browse/DISPATCH-113 Project: Qpid Dispatch Issue Type: Improvement Components: Router Node Affects Versions: 0.4 Reporter: michael goulish Priority: Minor NodeTracker is already keeping track of the last time it saw a topology change. I would like to expose that number to management so I can read it from my testing program and directly measure how long it takes the network to settle down after a topological change. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Resolved] (DISPATCH-106) pn link corruption after router restart
[ https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved DISPATCH-106. -- Resolution: Fixed Committed revision 1657604. pn link corruption after router restart --- Key: DISPATCH-106 URL: https://issues.apache.org/jira/browse/DISPATCH-106 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.3 Reporter: michael goulish Fix For: 0.4 With the standard 6-node demo network, (A-D, X, Y) after killing and restarting node Y, I see a bad link on router D -- which causes D to crash. Here is sequence of events from logs of routers and the topologist testing program: 01:05:05.367 Killing router Y, pid 20074 01:05:05.367 Sleeping 30 seconds 01:05:35.367 Restarting router Y, pid 20120 01:05:38 Router D : last valid origins post to its log file : Node QDR.C valid origins: [] 01:05:46 Router D posts to its log file: Exited Router Flux Mode 01:06:05.368 checking for crash after node bounce ( no crash detected ) 01:06:17 last post to router D log file ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 ls_seq=2 mobile_seq=0) 01:06:35.369 second check for crash. (none detected) 01:06:35.370 getting topology ( Node D fails to respond. PID 20072 ) ( core file, timestamped 01:06 ) here is backtrace from router D's core file { #0 pn_string_get (string=0xfdfdfdfdbabecafe) at /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120 #1 0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:112 #2 0x7ff73fa8e7dd in qd_entity_refresh_router_link (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:120 #3 0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6 #4 0x003e408056bc in ffi_call () from /lib64/libffi.so.6 #5 0x7ff737d2dc8b in _ctypes_callproc () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #6 0x7ff737d27a85 in PyCFuncPtr_call () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #7 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #8 0x0036df4de37c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #9 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #10 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #12 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0 #19 0x0036df44a29e in PyObject_CallFunction () from /lib64/libpython2.7.so.1.0 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, msg=0x7ff728019bd0, link_id=0 at /home/mick/dispatch/src/python_embedded.c:519 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, link=0x7ff730008710, delivery=0x7ff73004cc50) at /home/mick/dispatch/src/router_node.c:922 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at /home/mick/dispatch/src/container.c:221 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, unused=0x1e0a050, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:362 #24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438 #25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, cxtr=0x1e1b9b0) at /home/mick/dispatch/src/server.c:322 #26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at /home/mick/dispatch/src/server.c:546 #27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0 ... } Let's go up to qd_router_link_name at /home/mick/dispatch/src/router_agent.c:112 (gdb) print * link $1 = { prev = 0x7ff72800b210, next = 0x7ff72800b390, mask_bit = 3, link_type = QD_LINK_ROUTER, link_direction = QD_OUTGOING, owning_addr = 0x1d7d6c0, waypoint = 0x0, link =
[jira] [Commented] (DISPATCH-106) pn link corruption after router restart
[ https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303225#comment-14303225 ] michael goulish commented on DISPATCH-106: -- In server.c, the function thread_run() has this code: if (qdpn_connector_failed(cxtr)) qdpn_connector_close(cxtr); else work_done = process_connector(qd_server, cxtr); By removing the else we got my test to go to 148 iterations before failing. And the crash is much different from what I have been seeing. Before this change, the test almost always failed no later than iteration 3. So -- bug fixed. why: Because when the connector has failed, there are still some events on it that need to be processed. When they get processed, the links associated with this connection get cleaned up properly. If you don't do this final processing of events on the dead connector, the dispatch code will still have dead links sitting around pointing to some memory that will (usually) get freed by proton. Boom. pn link corruption after router restart --- Key: DISPATCH-106 URL: https://issues.apache.org/jira/browse/DISPATCH-106 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.3 Reporter: michael goulish Fix For: 0.4 With the standard 6-node demo network, (A-D, X, Y) after killing and restarting node Y, I see a bad link on router D -- which causes D to crash. Here is sequence of events from logs of routers and the topologist testing program: 01:05:05.367 Killing router Y, pid 20074 01:05:05.367 Sleeping 30 seconds 01:05:35.367 Restarting router Y, pid 20120 01:05:38 Router D : last valid origins post to its log file : Node QDR.C valid origins: [] 01:05:46 Router D posts to its log file: Exited Router Flux Mode 01:06:05.368 checking for crash after node bounce ( no crash detected ) 01:06:17 last post to router D log file ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 ls_seq=2 mobile_seq=0) 01:06:35.369 second check for crash. (none detected) 01:06:35.370 getting topology ( Node D fails to respond. PID 20072 ) ( core file, timestamped 01:06 ) here is backtrace from router D's core file { #0 pn_string_get (string=0xfdfdfdfdbabecafe) at /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120 #1 0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:112 #2 0x7ff73fa8e7dd in qd_entity_refresh_router_link (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:120 #3 0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6 #4 0x003e408056bc in ffi_call () from /lib64/libffi.so.6 #5 0x7ff737d2dc8b in _ctypes_callproc () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #6 0x7ff737d27a85 in PyCFuncPtr_call () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #7 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #8 0x0036df4de37c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #9 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #10 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #12 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0 #19 0x0036df44a29e in PyObject_CallFunction () from /lib64/libpython2.7.so.1.0 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, msg=0x7ff728019bd0, link_id=0 at /home/mick/dispatch/src/python_embedded.c:519 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, link=0x7ff730008710, delivery=0x7ff73004cc50) at /home/mick/dispatch/src/router_node.c:922 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at /home/mick/dispatch/src/container.c:221 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, unused=0x1e0a050, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:362 #24 0x7ff73fa80135 in
[jira] [Created] (DISPATCH-106) pn link corruption after router restart
michael goulish created DISPATCH-106: Summary: pn link corruption after router restart Key: DISPATCH-106 URL: https://issues.apache.org/jira/browse/DISPATCH-106 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.4 Reporter: michael goulish With the standard 6-node demo network, (A-D, X, Y) after killing and restarting node Y, I see a bad link on router D -- which causes D to crash. Here is sequence of events from logs of routers and the topologist testing program: 01:05:05.367 Killing router Y, pid 20074 01:05:05.367 Sleeping 30 seconds 01:05:35.367 Restarting router Y, pid 20120 01:05:38 Router D : last valid origins post to its log file : Node QDR.C valid origins: [] 01:05:46 Router D posts to its log file: Exited Router Flux Mode 01:06:05.368 checking for crash after node bounce ( no crash detected ) 01:06:17 last post to router D log file ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 ls_seq=2 mobile_seq=0) 01:06:35.369 second check for crash. (none detected) 01:06:35.370 getting topology ( Node D fails to respond. PID 20072 ) ( core file, timestamped 01:06 ) here is backtrace from router D's core file { #0 pn_string_get (string=0xfdfdfdfdbabecafe) at /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120 #1 0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:112 #2 0x7ff73fa8e7dd in qd_entity_refresh_router_link (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:120 #3 0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6 #4 0x003e408056bc in ffi_call () from /lib64/libffi.so.6 #5 0x7ff737d2dc8b in _ctypes_callproc () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #6 0x7ff737d27a85 in PyCFuncPtr_call () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #7 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #8 0x0036df4de37c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #9 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #10 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #12 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0 #19 0x0036df44a29e in PyObject_CallFunction () from /lib64/libpython2.7.so.1.0 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, msg=0x7ff728019bd0, link_id=0) at /home/mick/dispatch/src/python_embedded.c:519 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, link=0x7ff730008710, delivery=0x7ff73004cc50) at /home/mick/dispatch/src/router_node.c:922 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at /home/mick/dispatch/src/container.c:221 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, unused=0x1e0a050, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:362 #24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438 #25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, cxtr=0x1e1b9b0) at /home/mick/dispatch/src/server.c:322 #26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at /home/mick/dispatch/src/server.c:546 #27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0 ... } Let's go up to qd_router_link_name at /home/mick/dispatch/src/router_agent.c:112 (gdb) print * link $1 = { prev = 0x7ff72800b210, next = 0x7ff72800b390, mask_bit = 3, link_type = QD_LINK_ROUTER, link_direction = QD_OUTGOING, owning_addr = 0x1d7d6c0, waypoint = 0x0, link = 0x7ff7280099d0, connected_link = 0x0, ref = 0x7ff72800f350, target = 0x0, event_fifo = { head = 0x0, tail = 0x0, scratch = 0x0, size = 0 }, msg_fifo = { head =
[jira] [Updated] (DISPATCH-106) pn link corruption after router restart
[ https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish updated DISPATCH-106: - Description: With the standard 6-node demo network, (A-D, X, Y) after killing and restarting node Y, I see a bad link on router D -- which causes D to crash. Here is sequence of events from logs of routers and the topologist testing program: 01:05:05.367 Killing router Y, pid 20074 01:05:05.367 Sleeping 30 seconds 01:05:35.367 Restarting router Y, pid 20120 01:05:38 Router D : last valid origins post to its log file : Node QDR.C valid origins: [] 01:05:46 Router D posts to its log file: Exited Router Flux Mode 01:06:05.368 checking for crash after node bounce ( no crash detected ) 01:06:17 last post to router D log file ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 ls_seq=2 mobile_seq=0) 01:06:35.369 second check for crash. (none detected) 01:06:35.370 getting topology ( Node D fails to respond. PID 20072 ) ( core file, timestamped 01:06 ) here is backtrace from router D's core file { #0 pn_string_get (string=0xfdfdfdfdbabecafe) at /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120 #1 0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:112 #2 0x7ff73fa8e7dd in qd_entity_refresh_router_link (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0) at /home/mick/dispatch/src/router_agent.c:120 #3 0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6 #4 0x003e408056bc in ffi_call () from /lib64/libffi.so.6 #5 0x7ff737d2dc8b in _ctypes_callproc () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #6 0x7ff737d27a85 in PyCFuncPtr_call () from /usr/lib64/python2.7/lib-dynload/_ctypes.so #7 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #8 0x0036df4de37c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #9 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #10 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #12 0x0036df4e088f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0 #19 0x0036df44a29e in PyObject_CallFunction () from /lib64/libpython2.7.so.1.0 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, msg=0x7ff728019bd0, link_id=0 at /home/mick/dispatch/src/python_embedded.c:519 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, link=0x7ff730008710, delivery=0x7ff73004cc50) at /home/mick/dispatch/src/router_node.c:922 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at /home/mick/dispatch/src/container.c:221 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, unused=0x1e0a050, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:362 #24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS, qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438 #25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, cxtr=0x1e1b9b0) at /home/mick/dispatch/src/server.c:322 #26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at /home/mick/dispatch/src/server.c:546 #27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0 ... } Let's go up to qd_router_link_name at /home/mick/dispatch/src/router_agent.c:112 (gdb) print * link $1 = { prev = 0x7ff72800b210, next = 0x7ff72800b390, mask_bit = 3, link_type = QD_LINK_ROUTER, link_direction = QD_OUTGOING, owning_addr = 0x1d7d6c0, waypoint = 0x0, link = 0x7ff7280099d0, connected_link = 0x0, ref = 0x7ff72800f350, target = 0x0, event_fifo = { head = 0x0, tail = 0x0, scratch = 0x0, size = 0 }, msg_fifo = { head = 0x7ff73003c230, tail = 0x7ff73003bb70, scratch = 0x7ff73003b9f0, size = 102 } } (gdb) print * (link-link)
[jira] [Resolved] (DISPATCH-64) slow or sporadic memory leak
[ https://issues.apache.org/jira/browse/DISPATCH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish resolved DISPATCH-64. - Resolution: Fixed this was traced to a problem in proton, which rafi fixed. slow or sporadic memory leak Key: DISPATCH-64 URL: https://issues.apache.org/jira/browse/DISPATCH-64 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 5 minutes. Test setup === 1. single router on one box 2. 10 senders, 10 receivers on separate box. 3. each client handles 100 unique addresses. 4. while test is running, I run 'top' in a loop to see router memory usage (resident set size). I also run qdstat -m in a loop, to see router's report on usage of various data structures. 5. clients all have single connection for duration of test. 6. clients start once at beginning of test and do not stop until end. No new clients are started after the beginning. 7. no clients failed during the test. 8. no new addresses were added after test startup. Observations = 1. During a 64 minute period which started at least 15 minutes after the beginning of the test, memory usage (resident set size) as measured by 'top' grew from 96 to 109 megabytes. 2. Some of the data types reported by 'qdstat -m' increased. Here is the list: (using numbers from the 'total' column of qdstat report. ) qd_connection_t832 -- 896 qd_hash_handle_t 1408 -- 1600 qd_hash_item_t1408 -- 1600 qd_link_t 1536 -- 1664 qd_log_entry_t1152 -- 1216 qd_message_content_t 10256 -- 10272 qd_parsed_field_t 448 -- 1024 qd_router_link_ref_t 1408 -- 1600 qd_router_link_t 1536 -- 1664 3. The data structures that increased did *not* increase smoothly. For example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes before increasing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-64) slow or sporadic memory leak
michael goulish created DISPATCH-64: --- Summary: slow or sporadic memory leak Key: DISPATCH-64 URL: https://issues.apache.org/jira/browse/DISPATCH-64 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 5 minutes. Test setup === 1. single router on one box 2. 10 senders, 10 receivers on separate box. 3. each client handles 100 unique addresses. 4. while test is running, I run 'top' in a loop to see router memory usage (resident set size). I also run qdstat -m in a loop, to see router's report on usage of various data structures. 5. clients all have single connection for duration of test. 6. clients start once at beginning of test and do not stop until end. No new clients are started after the beginning. 7. no clients failed during the test. 8. no new addresses were added after test startup. Observations = 1. During a 64 minute period which started at least 15 minutes after the beginning of the test, memory usage (resident set size) as measured by 'top' grew from 96 to 109 megabytes. 2. Some of the data types reported by 'qdstat -m' increased. Here is the list: (using numbers from the 'total' column of qdstat report. ) qd_connection_t832 -- 896 qd_hash_handle_t 1408 -- 1600 qd_hash_item_t1408 -- 1600 qd_link_t 1536 -- 1664 qd_log_entry_t1152 -- 1216 qd_message_content_t 10256 -- 10272 qd_parsed_field_t 448 -- 1024 qd_router_link_ref_t 1408 -- 1600 qd_router_link_t 1536 -- 1664 3. The data structures that increased did *not* increase smoothly. For example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes before increasing. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-64) slow or sporadic memory leak
[ https://issues.apache.org/jira/browse/DISPATCH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073643#comment-14073643 ] michael goulish commented on DISPATCH-64: - Yes, RSS is increasing much more smoothly -- 1 MB every 4 or 5 minutes. I have 16 threads in the router. slow or sporadic memory leak Key: DISPATCH-64 URL: https://issues.apache.org/jira/browse/DISPATCH-64 Project: Qpid Dispatch Issue Type: Bug Reporter: michael goulish In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 5 minutes. Test setup === 1. single router on one box 2. 10 senders, 10 receivers on separate box. 3. each client handles 100 unique addresses. 4. while test is running, I run 'top' in a loop to see router memory usage (resident set size). I also run qdstat -m in a loop, to see router's report on usage of various data structures. 5. clients all have single connection for duration of test. 6. clients start once at beginning of test and do not stop until end. No new clients are started after the beginning. 7. no clients failed during the test. 8. no new addresses were added after test startup. Observations = 1. During a 64 minute period which started at least 15 minutes after the beginning of the test, memory usage (resident set size) as measured by 'top' grew from 96 to 109 megabytes. 2. Some of the data types reported by 'qdstat -m' increased. Here is the list: (using numbers from the 'total' column of qdstat report. ) qd_connection_t832 -- 896 qd_hash_handle_t 1408 -- 1600 qd_hash_item_t1408 -- 1600 qd_link_t 1536 -- 1664 qd_log_entry_t1152 -- 1216 qd_message_content_t 10256 -- 10272 qd_parsed_field_t 448 -- 1024 qd_router_link_ref_t 1408 -- 1600 qd_router_link_t 1536 -- 1664 3. The data structures that increased did *not* increase smoothly. For example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes before increasing. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (QPID-5910) Throughput regression relative to 0.14
[ https://issues.apache.org/jira/browse/QPID-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed QPID-5910. - Resolution: Fixed fixed in rev 1612559 Throughput regression relative to 0.14 -- Key: QPID-5910 URL: https://issues.apache.org/jira/browse/QPID-5910 Project: Qpid Issue Type: Bug Affects Versions: 0.22 Reporter: michael goulish Assignee: michael goulish Fix For: 0.29 If you use qpid-latency-test, hold message size constant, and gradually increase the sending rate (in several tests) you will sooner or later reach a point at which the messaging system's ability to handle throughput saturates. When that happens, latency will go sky-high. (I have producer flow-control turned off to be able to compare vs. older code.) The latest code reaches throughput saturation significantly earlier than older code. (i.e. at a lower sending rate.) Also, using 'perf' to help analyze the code, recent code is executing significantly fewer instructions per second than older code. This probably indicates that come parts of the code are spending too much time *while a lock is held* -- thus preventing other threads from fulfilling their destiny, and having an effect on overall throughput. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Closed] (QPID-5734) message loss in qpid client
[ https://issues.apache.org/jira/browse/QPID-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael goulish closed QPID-5734. - Resolution: Fixed This is superseded by QPID-5737, which has been fixed. message loss in qpid client --- Key: QPID-5734 URL: https://issues.apache.org/jira/browse/QPID-5734 Project: Qpid Issue Type: Bug Reporter: michael goulish using latest qpid code as of 25 Apr 2014. In my qpid-messaging client, I do not ask for unreliable link: std::string sender_address = x; Sender sender = session.createSender ( sender_address ); I call sender.send() 1000 times, each time to a different address. The call returns, apparently successful every time -- no throws or anything -- but my receivers do not get all messages. The messages are going through a dispatch router -- but I have now successfully traced the qpid-messaging sender, and I see that the missing messages are simply never transferred out of the sender -- so they never get to the router. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-5733) qpid-messaging client does not honor settle-without-accept
[ https://issues.apache.org/jira/browse/QPID-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984443#comment-13984443 ] michael goulish commented on QPID-5733: --- forgot to add -- this is with latest qpid trunk as of morning (EDT) 25 Apr 2014 qpid-messaging client does not honor settle-without-accept -- Key: QPID-5733 URL: https://issues.apache.org/jira/browse/QPID-5733 Project: Qpid Issue Type: Bug Components: C++ Client Reporter: michael goulish I have a qpid-messaging based sender, and a proton messenger base receiver, with a dispatch router in the middle. In my sender, if I do this: sender.send ( msg, 1 ) the sender locks up immediately and hangs. With tracing, I see that it is getting back a disposition frame for this message with settled=true -- but there is no explicit accept. That's when it locks up. If I alter my proton messenger based receiver to explicitly accept the message, then the test runs to completion with no problem. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (QPID-5733) qpid-messaging client does not honor settle-without-accept
michael goulish created QPID-5733: - Summary: qpid-messaging client does not honor settle-without-accept Key: QPID-5733 URL: https://issues.apache.org/jira/browse/QPID-5733 Project: Qpid Issue Type: Bug Components: C++ Client Reporter: michael goulish I have a qpid-messaging based sender, and a proton messenger base receiver, with a dispatch router in the middle. In my sender, if I do this: sender.send ( msg, 1 ) the sender locks up immediately and hangs. With tracing, I see that it is getting back a disposition frame for this message with settled=true -- but there is no explicit accept. That's when it locks up. If I alter my proton messenger based receiver to explicitly accept the message, then the test runs to completion with no problem. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (QPID-5734) message loss in qpid client
michael goulish created QPID-5734: - Summary: message loss in qpid client Key: QPID-5734 URL: https://issues.apache.org/jira/browse/QPID-5734 Project: Qpid Issue Type: Bug Reporter: michael goulish using latest qpid code as of 25 Apr 2014. In my qpid-messaging client, I do not ask for unreliable link: std::string sender_address = x; Sender sender = session.createSender ( sender_address ); I call sender.send() 1000 times, each time to a different address. The call returns, apparently successful every time -- no throws or anything -- but my receivers do not get all messages. The messages are going through a dispatch router -- but I have now successfully traced the qpid-messaging sender, and I see that the missing messages are simply never transferred out of the sender -- so they never get to the router. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (QPID-5734) message loss in qpid client
[ https://issues.apache.org/jira/browse/QPID-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984606#comment-13984606 ] michael goulish commented on QPID-5734: --- The receivers are proton messenger based. I can reproduce this behavior whether or not the senders explicitly accept the messages. message loss in qpid client --- Key: QPID-5734 URL: https://issues.apache.org/jira/browse/QPID-5734 Project: Qpid Issue Type: Bug Reporter: michael goulish using latest qpid code as of 25 Apr 2014. In my qpid-messaging client, I do not ask for unreliable link: std::string sender_address = x; Sender sender = session.createSender ( sender_address ); I call sender.send() 1000 times, each time to a different address. The call returns, apparently successful every time -- no throws or anything -- but my receivers do not get all messages. The messages are going through a dispatch router -- but I have now successfully traced the qpid-messaging sender, and I see that the missing messages are simply never transferred out of the sender -- so they never get to the router. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-45) starting clients too rapidly causes connection failures
michael goulish created DISPATCH-45: --- Summary: starting clients too rapidly causes connection failures Key: DISPATCH-45 URL: https://issues.apache.org/jira/browse/DISPATCH-45 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.2 Reporter: michael goulish I don't know if this should be a code change, or an extra warning issued by the router, or just a Note To Users of some kind, but I'm putting it here so as not to lose track of it. If I start too many clients too rapidly, all trying to connect to the same router, some of them will fail. My clients are very simple, not attempting any retries. When this shows up, it's looks like an error in the client, and users will probably hunt around for the cause. It can be avoided by simply putting occasional pauses in my client-launching script. Looks like some kind of backlog problem. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-32) Undeliverable messages should get released.
michael goulish created DISPATCH-32: --- Summary: Undeliverable messages should get released. Key: DISPATCH-32 URL: https://issues.apache.org/jira/browse/DISPATCH-32 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.2 Environment: cold, snowy. Reporter: michael goulish I have a test in which I make a 6-router network, then repeatedly kill and restart nodes. To determine when the network is ready to rock, I send messages to each node that I expect to find in the network. All messages are sent through the one node that I am connected to. At first, some of those messages are undeliverable. This is expected, since I just deliberately messed up the network. the problem is that, for those undeliverables, I never get back any kind of disposition. for the good ones, i get 'settled'. for the undeliverable ones, i get nothing. this means that i cannot close my session. if i created the sender on it this way: Sender sender = session.createSender(mgmt); then it will not close. I can work around the problem by creating the sender this way: Sender sender = session.createSender(mgmt; {link:{reliability:unreliable}}); ...but we should still get back dispos for all messages. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org