Rohan McGovern created PROTON-639:
-------------------------------------

             Summary: pn_messenger_recv hangs / spins on connection refused
                 Key: PROTON-639
                 URL: https://issues.apache.org/jira/browse/PROTON-639
             Project: Qpid Proton
          Issue Type: Bug
          Components: proton-c
    Affects Versions: 0.7, 0.8
         Environment: Red Hat Enterprise Linux 6.5
kernel: 2.6.32-431.1.2.el6.x86_64
qpid-proton 0.7 and 9939b8a990cd53c1b5e099c083bdcf61ad22232b git-svn-id: 
https://svn.apache.org/repos/asf/qpid/proton/trunk@1613151 
13f79535-47bb-0310-9956-ffa450edef68
            Reporter: Rohan McGovern


If I try to connect to a closed port with a messenger, pn_messenger_recv 
outputs messages to stderr and then spins at high CPU usage, rather than 
returning with an error as expected.

This seems to be impacted by kernel version.  I have a RHEL 6.5 machine which 
demonstrates this problem reliably when using kernel 2.6.32-431.1.2.el6.x86_64 
and not when using 3.10.28-1.el6.elrepo.x86_64 .

This can be easily reproduced using the "recv" example in the qpid-proton 
sources.

{noformat:title=kernel 2.6.32 - broken}
$ build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
# hangs at this point with high CPU usage
{noformat}

Compare with the behavior on a later kernel version, which seems right:

{noformat:title=kernel 3.10.28 - expected behavior}
build/examples/messenger/c/recv amqp://127.0.0.1:1
recv: Connection refused
[0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
CONNECTION ERROR connection aborted (remote)
send: Broken pipe
/home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid sources
# exits with exit code 1
{noformat}

Here's a sample backtrace when the hang is occurring:

{noformat}
(gdb) bt
#0  0x00007ffff7ffea11 in clock_gettime ()
#1  0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1
#2  0x00007ffff7de6b5e in pn_i_now () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#3  0x00007ffff7de4c06 in pn_selector_select () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#4  0x00007ffff7ddf736 in pni_wait () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#5  0x00007ffff7ddf869 in pn_messenger_tsync () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#6  0x00007ffff7ddf8df in pn_messenger_sync () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#7  0x00007ffff7de1676 in pn_messenger_recv () from 
/home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
#8  0x00000000004014b2 in main ()
{noformat}

There's a while(true) loop in pn_messenger_tsync which seems like it never 
escapes.  strace also shows that the process is repeatedly doing a poll.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to