[
https://issues.apache.org/jira/browse/PROTON-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohan McGovern updated PROTON-639:
----------------------------------
Description:
If I try to connect to a closed port with a messenger, pn_messenger_recv
outputs messages to stderr and then spins at high CPU usage, rather than
returning with an error as expected.
This seems to be impacted by kernel version. I have a RHEL 6.5 machine which
demonstrates this problem reliably when using kernel 2.6.32-431.1.2.el6.x86_64
and not when using 3.10.28-1.el6.elrepo.x86_64 .
This can be easily reproduced using the "recv" example in the qpid-proton
sources.
{noformat:title=kernel 2.6.32 - broken}
$ build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
# hangs at this point with high CPU usage
{noformat}
Compare with the behavior on a later kernel version, which seems right:
{noformat:title=kernel 3.10.28 - expected behavior}
$ build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
send: Broken pipe
/home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid sources
# exits with exit code 1
{noformat}
Here's a sample backtrace when the hang is occurring:
{noformat}
(gdb) bt
#0 0x00007ffff7ffea11 in clock_gettime ()
#1 0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1
#2 0x00007ffff7de6b5e in pn_i_now () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#3 0x00007ffff7de4c06 in pn_selector_select () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#4 0x00007ffff7ddf736 in pni_wait () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#5 0x00007ffff7ddf869 in pn_messenger_tsync () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#6 0x00007ffff7ddf8df in pn_messenger_sync () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#7 0x00007ffff7de1676 in pn_messenger_recv () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#8 0x00000000004014b2 in main ()
{noformat}
There's a while(true) loop in pn_messenger_tsync which seems like it never
escapes. strace also shows that the process is repeatedly doing a poll.
was:
If I try to connect to a closed port with a messenger, pn_messenger_recv
outputs messages to stderr and then spins at high CPU usage, rather than
returning with an error as expected.
This seems to be impacted by kernel version. I have a RHEL 6.5 machine which
demonstrates this problem reliably when using kernel 2.6.32-431.1.2.el6.x86_64
and not when using 3.10.28-1.el6.elrepo.x86_64 .
This can be easily reproduced using the "recv" example in the qpid-proton
sources.
{noformat:title=kernel 2.6.32 - broken}
$ build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
# hangs at this point with high CPU usage
{noformat}
Compare with the behavior on a later kernel version, which seems right:
{noformat:title=kernel 3.10.28 - expected behavior}
build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
send: Broken pipe
/home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid sources
# exits with exit code 1
{noformat}
Here's a sample backtrace when the hang is occurring:
{noformat}
(gdb) bt
#0 0x00007ffff7ffea11 in clock_gettime ()
#1 0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1
#2 0x00007ffff7de6b5e in pn_i_now () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#3 0x00007ffff7de4c06 in pn_selector_select () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#4 0x00007ffff7ddf736 in pni_wait () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#5 0x00007ffff7ddf869 in pn_messenger_tsync () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#6 0x00007ffff7ddf8df in pn_messenger_sync () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#7 0x00007ffff7de1676 in pn_messenger_recv () from
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#8 0x00000000004014b2 in main ()
{noformat}
There's a while(true) loop in pn_messenger_tsync which seems like it never
escapes. strace also shows that the process is repeatedly doing a poll.
> pn_messenger_recv hangs / spins on connection refused
> -----------------------------------------------------
>
> Key: PROTON-639
> URL: https://issues.apache.org/jira/browse/PROTON-639
> Project: Qpid Proton
> Issue Type: Bug
> Components: proton-c
> Affects Versions: 0.7, 0.8
> Environment: Red Hat Enterprise Linux 6.5
> kernel: 2.6.32-431.1.2.el6.x86_64
> qpid-proton 0.7 and 9939b8a990cd53c1b5e099c083bdcf61ad22232b git-svn-id:
> https://svn.apache.org/repos/asf/qpid/proton/trunk@1613151
> 13f79535-47bb-0310-9956-ffa450edef68
> Reporter: Rohan McGovern
>
> If I try to connect to a closed port with a messenger, pn_messenger_recv
> outputs messages to stderr and then spins at high CPU usage, rather than
> returning with an error as expected.
> This seems to be impacted by kernel version. I have a RHEL 6.5 machine which
> demonstrates this problem reliably when using kernel
> 2.6.32-431.1.2.el6.x86_64 and not when using 3.10.28-1.el6.elrepo.x86_64 .
> This can be easily reproduced using the "recv" example in the qpid-proton
> sources.
> {noformat:title=kernel 2.6.32 - broken}
> $ build/examples/messenger/c/recv amqp://127.0.0.1:1
> recv: Connection refused
> [0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
> CONNECTION ERROR connection aborted (remote)
> # hangs at this point with high CPU usage
> {noformat}
> Compare with the behavior on a later kernel version, which seems right:
> {noformat:title=kernel 3.10.28 - expected behavior}
> $ build/examples/messenger/c/recv amqp://127.0.0.1:1
> recv: Connection refused
> [0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
> CONNECTION ERROR connection aborted (remote)
> send: Broken pipe
> /home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid
> sources
> # exits with exit code 1
> {noformat}
> Here's a sample backtrace when the hang is occurring:
> {noformat}
> (gdb) bt
> #0 0x00007ffff7ffea11 in clock_gettime ()
> #1 0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1
> #2 0x00007ffff7de6b5e in pn_i_now () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #3 0x00007ffff7de4c06 in pn_selector_select () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #4 0x00007ffff7ddf736 in pni_wait () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #5 0x00007ffff7ddf869 in pn_messenger_tsync () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #6 0x00007ffff7ddf8df in pn_messenger_sync () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #7 0x00007ffff7de1676 in pn_messenger_recv () from
> /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
> #8 0x00000000004014b2 in main ()
> {noformat}
> There's a while(true) loop in pn_messenger_tsync which seems like it never
> escapes. strace also shows that the process is repeatedly doing a poll.
--
This message was sent by Atlassian JIRA
(v6.2#6252)