[ https://issues.apache.org/jira/browse/PROTON-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445250#comment-17445250 ]
Ken Giusti commented on PROTON-2466: ------------------------------------ This is a difficult issue to reproduce. In my experience it can take a few hours and the resulting log files are huge. To reproduce: # check out head of the qdrouter 1.18.x branch # back out the pointer clear patch that prevents the crash from occurring: ## commit 6734891419fcafdbc87d40eca269d07821c1b813 DISPATCH-2286: reset the raw conn context when handling disconnect # run two routers using the above configurations: ## rm -f qdrouterd-A-log.txt ; qdrouterd -c qdrouterd-A.conf & rm -f qdrouterd-B-log.txt ; qdrouterd -c qdrouterd-B.conf & # Install iperf3 # spawn an iperf3 server for the router to connected to: ## iperf3 -s -p 8080 & # run iperf3 clients to generate traffic in a loop: ## while iperf3 -c 127.0.0.1 -p 8000 -t 5 -P 8; do echo "OK"; sleep 2; done # wait for crash > raw connection posts wake events after disconnect event is handled > ------------------------------------------------------------------ > > Key: PROTON-2466 > URL: https://issues.apache.org/jira/browse/PROTON-2466 > Project: Qpid Proton > Issue Type: Bug > Components: proton-c > Affects Versions: proton-c-0.36.0 > Reporter: Ken Giusti > Priority: Major > Attachments: qdrouterd-A.conf, qdrouterd-B.conf > > > While running tcp stress tests against qdrouterd a crash occurred. The crash > was due to a stale pointer dereference. > qdrouterd code has been patched to properly clear the pointer and check for > null in the effected codepath. However... > ... the access occurred while processing a PN_RAW_CONNECTION_WAKE event that > arrived on a raw connection *after* a PN_RAW_CONNECTION_DISCONNECTED event > previously arrived on the raw connection. > IIUC the PN_RAW_CONNECTION_DISCONNECTED event is supposed to be the last > event generated on a raw connection, and once that event has been handled the > raw connection is released. If that is correct then the arrival of the > following WAKE event is a bug. > Here is the log output from the router just prior to the crash (filtered on > the affected connection): > $ tail C140.txt > > 2021-11-16 17:11:10.925728 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_WAKE connector > > 2021-11-16 17:11:10.926990 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_WAKE connector > > 2021-11-16 17:11:10.927001 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_READ connector Event > > 2021-11-16 17:11:10.927034 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_READ Read 0 bytes. Total read 0 bytes > > 2021-11-16 17:11:10.927596 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers > wrote 3276\ > 8 bytes. Total written 36929573 bytes > > 2021-11-16 17:11:10.928207 -0500 TCP_ADAPTOR (debug) [C140][L322] > PN_RAW_CONNECTION_CLOSED_READ connector > > 2021-11-16 17:11:10.928591 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_CLOSED_WRITE connector > > 2021-11-16 17:11:10.929160 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_WRITTEN connector pn_raw_connection_take_written_buffers > wrote 3276\ > 8 bytes. Total written 36962341 bytes > > *2021-11-16 17:11:10.929410 -0500 TCP_ADAPTOR (info) [C140] > PN_RAW_CONNECTION_DISCONNECTED connector* > *2021-11-16 17:11:10.929915 -0500 TCP_ADAPTOR (debug) [C140] > PN_RAW_CONNECTION_WAKE connector* -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org