[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887972#comment-15887972 ] Christopher L. Shannon commented on AMQ-6414: - [~gtully], sounds good. I've been running the patch since September and it has been working well. No more timeout issues on any of the nio+ssl transports and the countdown latch has prevented a concurrent read. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887825#comment-15887825 ] Gary Tully commented on AMQ-6414: - [~cshannon] I think it is ok as it is now, so long as we don't get a concurrent read which your initialized latch sorts. If I get a chance to explore the auto scenario we can revisit. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886147#comment-15886147 ] Christopher L. Shannon commented on AMQ-6414: - [~gtully], thanks for digging deeper into this to see the root cause. I had primarily noticed it with the auto+nio+ssl transport but then saw it happen in other cases which is when I discovered the need to trigger an extra read to finish reading the bytes. I still think the async start is needed when using auto+nio+ssl otherwise it will get stuck. For example, using NetworkReconnectSslNioTest that you added, If I change the doInit() method in NIOSSLTransport to the following it works fine: {code:java} protected void doInit() throws Exception { if (inputBuffer.position() > 0 && inputBuffer.hasRemaining() && wireFormat instanceof OpenWireFormat) { taskRunnerFactory.execute(new Runnable() { @Override public void run() { //Need to start in new thread to let startup finish first //We can trigger a read because we know the channel is ready since the SSL handshake //already happened serviceRead(); initialized.countDown(); } }); } else { initialized.countDown(); } } {code} However, if I then update your test to use the auto+nio+ssl transport it will no longer work anymore even with the above change at as it needs to be async so the detection doesn't get stuck. So, we could either leave things the way they currently are now (with the async always scheduled) or update it so that async is only always scheduled for auto+nio+ssl and conditionally for regular nio+ssl transports. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886124#comment-15886124 ] ASF subversion and git services commented on AMQ-6414: -- Commit b6521e292b20788eef44f3a7810d8e4281b677dc in activemq's branch refs/heads/master from [~gtully] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=b6521e2 ] [AMQ-6414] additional logging that helped identify root cause > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886106#comment-15886106 ] Gary Tully commented on AMQ-6414: - I was digging into this fix from another angle, hence the additional test. The issue is that doing the ssl handshake reads the first openwire command, so there is some remaining data to read once the handshake completes. wireformat negotiation times out and the brokerInfo never gets sent and it hangs. The async read in a different thread can be conditional on: {code} if (inputBuffer.position() > 0 && inputBuffer.hasRemaining() && wireFormat instanceof OpenWireFormat) {code} however doing a blocking read is also fine for openwire because of the negotiation that will follow. Turns out there is similar logic in mqtt and stomp (without the sync around negotiation): https://github.com/apache/activemq/blob/da9fedead4078cc82efb32e15d8d9cd53c8e82dc/activemq-mqtt/src/main/java/org/apache/activemq/transport/mqtt/MQTTNIOSSLTransport.java#L53 As an aside, the blocking around openwire negotiation relates to the need to send each sides preferred options with default format so that it will be properly marshalled/unmarshalled *before* options are applied. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886083#comment-15886083 ] ASF subversion and git services commented on AMQ-6414: -- Commit a1f6261fb248fd2d39b78e7f2245352009394975 in activemq's branch refs/heads/master from [~gtully] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=a1f6261 ] [AMQ-6414] additional test that reproduces and validates > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549428#comment-15549428 ] Christopher L. Shannon commented on AMQ-6414: - It could happen for earlier versions also so upgrading is probably a good idea. However, in my testing it seemed to happen pretty rarely and certain things (like using the auto transport) made it more likely so if you haven't seen it happen yet then it may not happen in your environment. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549415#comment-15549415 ] Bruce Dodson commented on AMQ-6414: --- Reviewing the changes in e.g. NIOSSLTransport.java, considering your description of why it occurs, and comparing to the 5.13.4 source code, it appears this bug could affect earlier versions as well - or did it surface only in 5.14.0 due to other changes elsewhere? (In other words, more to the point, if we're using 5.13.4 and nio+ssl, is it advisable to upgrade to 5.14.1?) Thanks! > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15467519#comment-15467519 ] ASF subversion and git services commented on AMQ-6414: -- Commit 679f0cfd311c362aa7821763067d9c9fa9bdc801 in activemq's branch refs/heads/activemq-5.14.x from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=679f0cf ] https://issues.apache.org/jira/browse/AMQ-6414 Changing the nio+ssl transports to trigger a serviceRead after start up to prevent blocking. The prevents the channels from not reading in certain cases, most notably with the auto+nio+ssl transport when used for a network bridge. Also added a couple tests and changed a network bridge test to test out auto+nio+ssl. (cherry picked from commit ed0e786b6002633411037923fb28a075489e442b) > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.14.1, 5.15.0 > > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455807#comment-15455807 ] Christopher L. Shannon commented on AMQ-6414: - I will let CI run and make sure all of the tests are still good and then close this out and backport the fix to 5.14.x. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AMQ-6414) The nio+ssl transports can block and hang on connection in certain situations
[ https://issues.apache.org/jira/browse/AMQ-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455803#comment-15455803 ] ASF subversion and git services commented on AMQ-6414: -- Commit ed0e786b6002633411037923fb28a075489e442b in activemq's branch refs/heads/master from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=ed0e786 ] https://issues.apache.org/jira/browse/AMQ-6414 Changing the nio+ssl transports to trigger a serviceRead after start up to prevent blocking. The prevents the channels from not reading in certain cases, most notably with the auto+nio+ssl transport when used for a network bridge. Also added a couple tests and changed a network bridge test to test out auto+nio+ssl. > The nio+ssl transports can block and hang on connection in certain situations > - > > Key: AMQ-6414 > URL: https://issues.apache.org/jira/browse/AMQ-6414 > Project: ActiveMQ > Issue Type: Bug > Components: Broker, Transport >Affects Versions: 5.14.0 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > > The nio+ssl transports can hang on initial connection and never read from the > socket after the SSL handshake for certain conditions. This behavior is most > evident when using the auto+nio+ssl transport for a network bridge between 2 > brokers, however I also saw the issue for the normal nio+ssl transport when > running the NetworkAsyncStartTest and even the amqp+nio+ssl transport. > After debugging I found that the issue is that the onSelect method of the > registered callback, which calls the serviceRead() method, is not always > getting triggered. I believe that the root of the problem is that even > though a selector is registered with a SelectionKey.OP_READ, there is no > guarantee that the selected set is correct which is what the SelectorWorker > uses to detect if the operation is ready. The SelectionKey documentation > specifically states that the ready set is a hint but not a guarantee that the > channel is ready. This seems to only effect the SSL transport (not normal > NIO), probably because a read selection was already done once to unwrap the > SSL transport > More info: > https://docs.oracle.com/javase/8/docs/api/java/nio/channels/SelectionKey.html > The fix for this is to trigger the selectRead() after the transport finishes > starting up. (needs to be in a new thread specifically for OpenWire to allow > the wireformat negotiation to not block on startup). This will work for the > SSL transport specifically since we know the transport is read to read from > the the channel after starting up. We know this because the SSL handshake > already took place which means we've already read from the channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)