[jira] [Commented] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-18 Thread Paul Rogalinski (JIRA)


[ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515508#comment-16515508
 ] 

Paul Rogalinski commented on QPIDJMS-392:
-

[~gemmellr]

We managed to catch another incident, I have attached new log files to this 
issue (logs-with-frames-2018-06-17.tar.gz). The stdout variant does contain 
additional _PN_TRACE_FRM_ debug messages but misses detailed stack traces. I 
hope this helps to evaluate the issue.

Thanks!

> Stalled Service Bus Subscriptions, Reconnect Loops
> --
>
> Key: QPIDJMS-392
> URL: https://issues.apache.org/jira/browse/QPIDJMS-392
> Project: Qpid JMS
>  Issue Type: Bug
>  Components: qpid-jms-client
>Affects Versions: 0.32.0
> Environment: * Azure App Service
>  * Spring 1.5.12 via embedded web server / standalone
>  * java.runtime.version "1.8.0_144-b01"
>  * java.vm.version "25.144-b01"
>  * java.vm.vendor "Azul Systems, Inc."
>  * os.arch "amd64"
>  * os.name "Windows Server 2016"
>Reporter: Paul Rogalinski
>Priority: Major
> Attachments: logs-with-frames-2018-06-17.tar.gz, qpid-log.txt
>
>
> We are experiencing connection recovery loops after an random (hours to days) 
> amount of time on JMS connection subscribed to a Service Bus Topic via 
> (durable) Subscription. The application does connect to two different service 
> bus instances at the same time.
> We had to use the failover:(amqp:...) URL pattern in order to actually see 
> something in the log files as the FailoverProvider is a bit more verbose 
> (TRACE level). When using plain ampqs:// connection URL pattern the only way 
> to tell a hanging/stalled subscription was to monitor the number of 
> active/unconsumed Messages in a Service Bus' Subscription using a different 
> client. Regardless of the Provider implementation, the problem of stalled 
> subscriptions persisted.
> Log observations so far:
> After an initial disconnect:
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> CONNECTION_REMOTE_CLOSE
> (...)
> Failover: the provider reports failure: The connection was inactive for more 
> than the allowed 30 milliseconds and is closed by container 
> 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> {code}
> the application will enter into a reconnect/connection recovery loop with an 
> interval of one minute.
> {code:java}
> 18:11:44.670Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 30 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> 18:12:44.789Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
> amqp:connection:forced]
> 18:13:44.918Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
> amqp:connection:forced]
> {code}
> In between we see the connection being recovered along with log messages as:
> {code:java}
> org.apache.qpid.jms.JmsConnection    : Connection 
> ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
> {code}
> followed by
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> SESSION_REMOTE_CLOSE
> {code}
> after a minute.
> A full cycle is attached in the log file, URLs are anonymized.
> In general we can see the following sequence of Proton-Events repeating:
> {code:java}
> 23:58:53.676Z CONNECTION_INIT
> 23:58:53.722Z CONNECTION_LOCAL_OPEN
> 23:58:53.722Z CONNECTION_REMOTE_OPEN
> 23:58:53.722Z SESSION_INIT
> 23:58:53.722Z SESSION_LOCAL_OPEN
> 23:58:53.738Z SESSION_REMOTE_OPEN
> 23:59:53.690Z SESSION_REMOTE_CLOSE
> 23:59:53.690Z SESSION_LOCAL_CLOSE
> 23:59:53.690Z SESSION_FINAL
> 23:59:53.706Z CONNECTION_REMOTE_CLOSE
> 23:59:53.706Z TRANSPORT_TAIL_CLOSED
> 23:59:53.706Z CONNECTION_LOCAL_CLOSE
> 23:59:53.706Z TRANSPORT_HEAD_CLOSED
> 23:59:53.706Z TRANSPORT_CLOSED
> {code}
> We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
> originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
> learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
> Subscriptions will be terminated. According to Azure-Support the issue is 
> most 

[jira] [Updated] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-18 Thread Paul Rogalinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogalinski updated QPIDJMS-392:

Attachment: logs-with-frames-2018-06-17.tar.gz

> Stalled Service Bus Subscriptions, Reconnect Loops
> --
>
> Key: QPIDJMS-392
> URL: https://issues.apache.org/jira/browse/QPIDJMS-392
> Project: Qpid JMS
>  Issue Type: Bug
>  Components: qpid-jms-client
>Affects Versions: 0.32.0
> Environment: * Azure App Service
>  * Spring 1.5.12 via embedded web server / standalone
>  * java.runtime.version "1.8.0_144-b01"
>  * java.vm.version "25.144-b01"
>  * java.vm.vendor "Azul Systems, Inc."
>  * os.arch "amd64"
>  * os.name "Windows Server 2016"
>Reporter: Paul Rogalinski
>Priority: Major
> Attachments: logs-with-frames-2018-06-17.tar.gz, qpid-log.txt
>
>
> We are experiencing connection recovery loops after an random (hours to days) 
> amount of time on JMS connection subscribed to a Service Bus Topic via 
> (durable) Subscription. The application does connect to two different service 
> bus instances at the same time.
> We had to use the failover:(amqp:...) URL pattern in order to actually see 
> something in the log files as the FailoverProvider is a bit more verbose 
> (TRACE level). When using plain ampqs:// connection URL pattern the only way 
> to tell a hanging/stalled subscription was to monitor the number of 
> active/unconsumed Messages in a Service Bus' Subscription using a different 
> client. Regardless of the Provider implementation, the problem of stalled 
> subscriptions persisted.
> Log observations so far:
> After an initial disconnect:
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> CONNECTION_REMOTE_CLOSE
> (...)
> Failover: the provider reports failure: The connection was inactive for more 
> than the allowed 30 milliseconds and is closed by container 
> 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> {code}
> the application will enter into a reconnect/connection recovery loop with an 
> interval of one minute.
> {code:java}
> 18:11:44.670Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 30 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> 18:12:44.789Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
> amqp:connection:forced]
> 18:13:44.918Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
> amqp:connection:forced]
> {code}
> In between we see the connection being recovered along with log messages as:
> {code:java}
> org.apache.qpid.jms.JmsConnection    : Connection 
> ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
> {code}
> followed by
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> SESSION_REMOTE_CLOSE
> {code}
> after a minute.
> A full cycle is attached in the log file, URLs are anonymized.
> In general we can see the following sequence of Proton-Events repeating:
> {code:java}
> 23:58:53.676Z CONNECTION_INIT
> 23:58:53.722Z CONNECTION_LOCAL_OPEN
> 23:58:53.722Z CONNECTION_REMOTE_OPEN
> 23:58:53.722Z SESSION_INIT
> 23:58:53.722Z SESSION_LOCAL_OPEN
> 23:58:53.738Z SESSION_REMOTE_OPEN
> 23:59:53.690Z SESSION_REMOTE_CLOSE
> 23:59:53.690Z SESSION_LOCAL_CLOSE
> 23:59:53.690Z SESSION_FINAL
> 23:59:53.706Z CONNECTION_REMOTE_CLOSE
> 23:59:53.706Z TRANSPORT_TAIL_CLOSED
> 23:59:53.706Z CONNECTION_LOCAL_CLOSE
> 23:59:53.706Z TRANSPORT_HEAD_CLOSED
> 23:59:53.706Z TRANSPORT_CLOSED
> {code}
> We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
> originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
> learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
> Subscriptions will be terminated. According to Azure-Support the issue is 
> most likely related to Qpid connection handling. This does at least explain 
> the first / initial timeout of 30 seen in the logs. The connection / 
> client is set up as follows
> URLs:
> {code:java}
> failover:(amqps://instance1.servicebus.windows.net)?amqp.idleTimeout=12000
> 

[jira] [Commented] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-13 Thread Paul Rogalinski (JIRA)


[ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510979#comment-16510979
 ] 

Paul Rogalinski commented on QPIDJMS-392:
-

Thanks for the clarification. Configured the application to log both variants. 
Will come back as soon we have collected data of a new incident.

> Stalled Service Bus Subscriptions, Reconnect Loops
> --
>
> Key: QPIDJMS-392
> URL: https://issues.apache.org/jira/browse/QPIDJMS-392
> Project: Qpid JMS
>  Issue Type: Bug
>  Components: qpid-jms-client
>Affects Versions: 0.32.0
> Environment: * Azure App Service
>  * Spring 1.5.12 via embedded web server / standalone
>  * java.runtime.version "1.8.0_144-b01"
>  * java.vm.version "25.144-b01"
>  * java.vm.vendor "Azul Systems, Inc."
>  * os.arch "amd64"
>  * os.name "Windows Server 2016"
>Reporter: Paul Rogalinski
>Priority: Major
> Attachments: qpid-log.txt
>
>
> We are experiencing connection recovery loops after an random (hours to days) 
> amount of time on JMS connection subscribed to a Service Bus Topic via 
> (durable) Subscription. The application does connect to two different service 
> bus instances at the same time.
> We had to use the failover:(amqp:...) URL pattern in order to actually see 
> something in the log files as the FailoverProvider is a bit more verbose 
> (TRACE level). When using plain ampqs:// connection URL pattern the only way 
> to tell a hanging/stalled subscription was to monitor the number of 
> active/unconsumed Messages in a Service Bus' Subscription using a different 
> client. Regardless of the Provider implementation, the problem of stalled 
> subscriptions persisted.
> Log observations so far:
> After an initial disconnect:
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> CONNECTION_REMOTE_CLOSE
> (...)
> Failover: the provider reports failure: The connection was inactive for more 
> than the allowed 30 milliseconds and is closed by container 
> 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> {code}
> the application will enter into a reconnect/connection recovery loop with an 
> interval of one minute.
> {code:java}
> 18:11:44.670Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 30 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> 18:12:44.789Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
> amqp:connection:forced]
> 18:13:44.918Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
> amqp:connection:forced]
> {code}
> In between we see the connection being recovered along with log messages as:
> {code:java}
> org.apache.qpid.jms.JmsConnection    : Connection 
> ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
> {code}
> followed by
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> SESSION_REMOTE_CLOSE
> {code}
> after a minute.
> A full cycle is attached in the log file, URLs are anonymized.
> In general we can see the following sequence of Proton-Events repeating:
> {code:java}
> 23:58:53.676Z CONNECTION_INIT
> 23:58:53.722Z CONNECTION_LOCAL_OPEN
> 23:58:53.722Z CONNECTION_REMOTE_OPEN
> 23:58:53.722Z SESSION_INIT
> 23:58:53.722Z SESSION_LOCAL_OPEN
> 23:58:53.738Z SESSION_REMOTE_OPEN
> 23:59:53.690Z SESSION_REMOTE_CLOSE
> 23:59:53.690Z SESSION_LOCAL_CLOSE
> 23:59:53.690Z SESSION_FINAL
> 23:59:53.706Z CONNECTION_REMOTE_CLOSE
> 23:59:53.706Z TRANSPORT_TAIL_CLOSED
> 23:59:53.706Z CONNECTION_LOCAL_CLOSE
> 23:59:53.706Z TRANSPORT_HEAD_CLOSED
> 23:59:53.706Z TRANSPORT_CLOSED
> {code}
> We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
> originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
> learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
> Subscriptions will be terminated. According to Azure-Support the issue is 
> most likely related to Qpid connection handling. This does at least explain 
> the first / initial timeout of 30 seen in the logs. The connection / 
> client is set up as follows
> URLs:

[jira] [Commented] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-13 Thread Paul Rogalinski (JIRA)


[ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510954#comment-16510954
 ] 

Paul Rogalinski commented on QPIDJMS-392:
-

[~gemmellr]

_amqp.traceFrames=true_ URL parameter seems to be only effective within 
Unit-Tests and not for API consumers. According to the usages of setTraceFrames 
-> updateTracer in AmqpProvider there are no other calls to that code path 
other than from Unit Tests. _PN_TRACE_FRM_ works on the other hand. We will be 
collecting data frames logged by using the latter method hoping for the issue 
to crop up again soon. If there is any reason to use the ProtocolTracer defined 
in AmqpProvider - please let me know. I would need to shortwire that in a 
forked version of qpid-jms-client and deploy a customized variant of the 
application. 

Thank you for your time.

> Stalled Service Bus Subscriptions, Reconnect Loops
> --
>
> Key: QPIDJMS-392
> URL: https://issues.apache.org/jira/browse/QPIDJMS-392
> Project: Qpid JMS
>  Issue Type: Bug
>  Components: qpid-jms-client
>Affects Versions: 0.32.0
> Environment: * Azure App Service
>  * Spring 1.5.12 via embedded web server / standalone
>  * java.runtime.version "1.8.0_144-b01"
>  * java.vm.version "25.144-b01"
>  * java.vm.vendor "Azul Systems, Inc."
>  * os.arch "amd64"
>  * os.name "Windows Server 2016"
>Reporter: Paul Rogalinski
>Priority: Major
> Attachments: qpid-log.txt
>
>
> We are experiencing connection recovery loops after an random (hours to days) 
> amount of time on JMS connection subscribed to a Service Bus Topic via 
> (durable) Subscription. The application does connect to two different service 
> bus instances at the same time.
> We had to use the failover:(amqp:...) URL pattern in order to actually see 
> something in the log files as the FailoverProvider is a bit more verbose 
> (TRACE level). When using plain ampqs:// connection URL pattern the only way 
> to tell a hanging/stalled subscription was to monitor the number of 
> active/unconsumed Messages in a Service Bus' Subscription using a different 
> client. Regardless of the Provider implementation, the problem of stalled 
> subscriptions persisted.
> Log observations so far:
> After an initial disconnect:
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> CONNECTION_REMOTE_CLOSE
> (...)
> Failover: the provider reports failure: The connection was inactive for more 
> than the allowed 30 milliseconds and is closed by container 
> 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> {code}
> the application will enter into a reconnect/connection recovery loop with an 
> interval of one minute.
> {code:java}
> 18:11:44.670Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 30 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> 18:12:44.789Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
> amqp:connection:forced]
> 18:13:44.918Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
> amqp:connection:forced]
> {code}
> In between we see the connection being recovered along with log messages as:
> {code:java}
> org.apache.qpid.jms.JmsConnection    : Connection 
> ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
> {code}
> followed by
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> SESSION_REMOTE_CLOSE
> {code}
> after a minute.
> A full cycle is attached in the log file, URLs are anonymized.
> In general we can see the following sequence of Proton-Events repeating:
> {code:java}
> 23:58:53.676Z CONNECTION_INIT
> 23:58:53.722Z CONNECTION_LOCAL_OPEN
> 23:58:53.722Z CONNECTION_REMOTE_OPEN
> 23:58:53.722Z SESSION_INIT
> 23:58:53.722Z SESSION_LOCAL_OPEN
> 23:58:53.738Z SESSION_REMOTE_OPEN
> 23:59:53.690Z SESSION_REMOTE_CLOSE
> 23:59:53.690Z SESSION_LOCAL_CLOSE
> 23:59:53.690Z SESSION_FINAL
> 23:59:53.706Z CONNECTION_REMOTE_CLOSE
> 23:59:53.706Z TRANSPORT_TAIL_CLOSED
> 23:59:53.706Z CONNECTION_LOCAL_CLOSE
> 23:59:53.706Z TRANSPORT_HEAD_CLOSED
> 23:59:53.706Z 

[jira] [Commented] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-12 Thread Paul Rogalinski (JIRA)


[ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509733#comment-16509733
 ] 

Paul Rogalinski commented on QPIDJMS-392:
-

Sorry for not taking that issue onto the mailing list first, I followed the 
directions at 
[https://qpid.apache.org/issues.html|https://qpid.apache.org/issues.html#report-a-bug]

I will be collecting more logs as you suggested and update the issue 
accordingly.

Thank you for your time!

> Stalled Service Bus Subscriptions, Reconnect Loops
> --
>
> Key: QPIDJMS-392
> URL: https://issues.apache.org/jira/browse/QPIDJMS-392
> Project: Qpid JMS
>  Issue Type: Bug
>  Components: qpid-jms-client
>Affects Versions: 0.32.0
> Environment: * Azure App Service
>  * Spring 1.5.12 via embedded web server / standalone
>  * java.runtime.version "1.8.0_144-b01"
>  * java.vm.version "25.144-b01"
>  * java.vm.vendor "Azul Systems, Inc."
>  * os.arch "amd64"
>  * os.name "Windows Server 2016"
>Reporter: Paul Rogalinski
>Priority: Major
> Attachments: qpid-log.txt
>
>
> We are experiencing connection recovery loops after an random (hours to days) 
> amount of time on JMS connection subscribed to a Service Bus Topic via 
> (durable) Subscription. The application does connect to two different service 
> bus instances at the same time.
> We had to use the failover:(amqp:...) URL pattern in order to actually see 
> something in the log files as the FailoverProvider is a bit more verbose 
> (TRACE level). When using plain ampqs:// connection URL pattern the only way 
> to tell a hanging/stalled subscription was to monitor the number of 
> active/unconsumed Messages in a Service Bus' Subscription using a different 
> client. Regardless of the Provider implementation, the problem of stalled 
> subscriptions persisted.
> Log observations so far:
> After an initial disconnect:
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> CONNECTION_REMOTE_CLOSE
> (...)
> Failover: the provider reports failure: The connection was inactive for more 
> than the allowed 30 milliseconds and is closed by container 
> 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> {code}
> the application will enter into a reconnect/connection recovery loop with an 
> interval of one minute.
> {code:java}
> 18:11:44.670Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 30 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
> amqp:connection:forced]
> 18:12:44.789Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
> amqp:connection:forced]
> 18:13:44.918Z Failover: the provider reports failure: The connection was 
> inactive for more than the allowed 6 milliseconds and is closed by 
> container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
> SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
> amqp:connection:forced]
> {code}
> In between we see the connection being recovered along with log messages as:
> {code:java}
> org.apache.qpid.jms.JmsConnection    : Connection 
> ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
> {code}
> followed by
> {code:java}
> o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
> SESSION_REMOTE_CLOSE
> {code}
> after a minute.
> A full cycle is attached in the log file, URLs are anonymized.
> In general we can see the following sequence of Proton-Events repeating:
> {code:java}
> 23:58:53.676Z CONNECTION_INIT
> 23:58:53.722Z CONNECTION_LOCAL_OPEN
> 23:58:53.722Z CONNECTION_REMOTE_OPEN
> 23:58:53.722Z SESSION_INIT
> 23:58:53.722Z SESSION_LOCAL_OPEN
> 23:58:53.738Z SESSION_REMOTE_OPEN
> 23:59:53.690Z SESSION_REMOTE_CLOSE
> 23:59:53.690Z SESSION_LOCAL_CLOSE
> 23:59:53.690Z SESSION_FINAL
> 23:59:53.706Z CONNECTION_REMOTE_CLOSE
> 23:59:53.706Z TRANSPORT_TAIL_CLOSED
> 23:59:53.706Z CONNECTION_LOCAL_CLOSE
> 23:59:53.706Z TRANSPORT_HEAD_CLOSED
> 23:59:53.706Z TRANSPORT_CLOSED
> {code}
> We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
> originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
> learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
> Subscriptions will be terminated. According to Azure-Support the issue is 
> most likely related to Qpid connection 

[jira] [Updated] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-12 Thread Paul Rogalinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogalinski updated QPIDJMS-392:

Description: 
We are experiencing connection recovery loops after an random (hours to days) 
amount of time on JMS connection subscribed to a Service Bus Topic via 
(durable) Subscription. The application does connect to two different service 
bus instances at the same time.

We had to use the failover:(amqp:...) URL pattern in order to actually see 
something in the log files as the FailoverProvider is a bit more verbose (TRACE 
level). When using plain ampqs:// connection URL pattern the only way to tell a 
hanging/stalled subscription was to monitor the number of active/unconsumed 
Messages in a Service Bus' Subscription using a different client. Regardless of 
the Provider implementation, the problem of stalled subscriptions persisted.

Log observations so far:

After an initial disconnect:
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
CONNECTION_REMOTE_CLOSE
(...)
Failover: the provider reports failure: The connection was inactive for more 
than the allowed 30 milliseconds and is closed by container 'LinkTracker'. 
TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, SystemTracker:gateway9, 
Timestamp:5/1/2018 6:11:44 PM [condition = amqp:connection:forced]
{code}
the application will enter into a reconnect/connection recovery loop with an 
interval of one minute.
{code:java}
18:11:44.670Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 30 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
amqp:connection:forced]
18:12:44.789Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
amqp:connection:forced]
18:13:44.918Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
amqp:connection:forced]
{code}
In between we see the connection being recovered along with log messages as:
{code:java}
org.apache.qpid.jms.JmsConnection    : Connection 
ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
{code}
followed by
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
SESSION_REMOTE_CLOSE
{code}
after a minute.

A full cycle is attached in the log file, URLs are anonymized.

In general we can see the following sequence of Proton-Events repeating:
{code:java}
23:58:53.676Z CONNECTION_INIT
23:58:53.722Z CONNECTION_LOCAL_OPEN
23:58:53.722Z CONNECTION_REMOTE_OPEN
23:58:53.722Z SESSION_INIT
23:58:53.722Z SESSION_LOCAL_OPEN
23:58:53.738Z SESSION_REMOTE_OPEN
23:59:53.690Z SESSION_REMOTE_CLOSE
23:59:53.690Z SESSION_LOCAL_CLOSE
23:59:53.690Z SESSION_FINAL
23:59:53.706Z CONNECTION_REMOTE_CLOSE
23:59:53.706Z TRANSPORT_TAIL_CLOSED
23:59:53.706Z CONNECTION_LOCAL_CLOSE
23:59:53.706Z TRANSPORT_HEAD_CLOSED
23:59:53.706Z TRANSPORT_CLOSED
{code}
We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
Subscriptions will be terminated. According to Azure-Support the issue is most 
likely related to Qpid connection handling. This does at least explain the 
first / initial timeout of 30 seen in the logs. The connection / client is 
set up as follows

URLs:
{code:java}
failover:(amqps://instance1.servicebus.windows.net)?amqp.idleTimeout=12000
failover:(amqps://instance2.servicebus.windows.net)?amqp.idleTimeout=12000
{code}
Connection Factories:
{code:java}
JmsConnectionFactory qpidConnectionFactory = new 
JmsConnectionFactory(sbUsername, sbPassword, sbAmqpRemoteUri);
 qpidConnectionFactory.setReceiveLocalOnly(true);
qpidConnectionFactory.setClientID(UUID.randomUUID().toString()); {code}
JmsListenerContainerFactories:
{code:java}
DefaultJmsListenerContainerFactory jmsListenerContainerFactory = new 
DefaultJmsListenerContainerFactory();
jmsListenerContainerFactory.setConnectionFactory(jmsConnectionFactory);
jmsListenerContainerFactory.setPubSubDomain(true);
jmsListenerContainerFactory.setSubscriptionDurable(true);
jmsListenerContainerFactory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return jmsListenerContainerFactory;{code}
And Spring-JMS Listeners:
{code:java}
@Component
public class MyListener {
@Autowired
JMSSupport jmsSupport;
 

[jira] [Updated] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-12 Thread Paul Rogalinski (JIRA)


 [ 
https://issues.apache.org/jira/browse/QPIDJMS-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogalinski updated QPIDJMS-392:

Description: 
We are experiencing connection recovery loops after an random (hours to days) 
amount of time on JMS connection subscribed to a Service Bus Topic via 
(durable) Subscription. The application does connect to two different service 
bus instances at the same time.

We had to use the failover:(amqp:...) URL pattern in order to actually see 
something in the log files as the FailoverProvider is a bit more verbose (TRACE 
level). When using plain ampqs:// connection URL pattern the only way to tell a 
hanging/stalled subscription was to monitor the number of active/unconsumed 
Messages in a Service Bus' Subscription using a different client. Regardless of 
the Provider implementation, the problem of stalled subscriptions persisted.

Log observations so far:

After an initial disconnect:
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
CONNECTION_REMOTE_CLOSE
(...)
Failover: the provider reports failure: The connection was inactive for more 
than the allowed 30 milliseconds and is closed by container 'LinkTracker'. 
TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, SystemTracker:gateway9, 
Timestamp:5/1/2018 6:11:44 PM [condition = amqp:connection:forced]
{code}
the application will enter into a reconnect/connection recovery loop with an 
interval of one minute.
{code:java}
18:11:44.670Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 30 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
amqp:connection:forced]
18:12:44.789Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
amqp:connection:forced]
18:13:44.918Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
amqp:connection:forced]
{code}
In between we see the connection being recovered along with log messages as:
{code:java}
org.apache.qpid.jms.JmsConnection    : Connection 
ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
{code}
followed by
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
SESSION_REMOTE_CLOSE
{code}
after a minute.

A full cycle is attached in the log file, URLs are anonymized.

In general we can see the following sequence of Proton-Events repeating:
{code:java}
23:58:53.676Z CONNECTION_INIT
23:58:53.722Z CONNECTION_LOCAL_OPEN
23:58:53.722Z CONNECTION_REMOTE_OPEN
23:58:53.722Z SESSION_INIT
23:58:53.722Z SESSION_LOCAL_OPEN
23:58:53.738Z SESSION_REMOTE_OPEN
23:59:53.690Z SESSION_REMOTE_CLOSE
23:59:53.690Z SESSION_LOCAL_CLOSE
23:59:53.690Z SESSION_FINAL
23:59:53.706Z CONNECTION_REMOTE_CLOSE
23:59:53.706Z TRANSPORT_TAIL_CLOSED
23:59:53.706Z CONNECTION_LOCAL_CLOSE
23:59:53.706Z TRANSPORT_HEAD_CLOSED
23:59:53.706Z TRANSPORT_CLOSED
{code}
We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
Subscriptions will be terminated. According to Azure-Support the issue is most 
likely related to Qpid connection handling. This does at least explain the 
first / initial timeout of 30 seen in the logs. The connection / client is 
set up as follows

URLs:
{code:java}
failover:(amqps://instance1.servicebus.windows.net)?amqp.idleTimeout=12000
failover:(amqps://instance2.servicebus.windows.net)?amqp.idleTimeout=12000
{code}
Connection Factories:
{code:java}
JmsConnectionFactory qpidConnectionFactory = new 
JmsConnectionFactory(sbUsername, sbPassword, sbAmqpRemoteUri);
 
qpidConnectionFactory.setReceiveLocalOnly(true);
qpidConnectionFactory.setClientID(UUID.randomUUID().toString()); {code}
JmsListenerContainerFactories:
{code:java}
DefaultJmsListenerContainerFactory jmsListenerContainerFactory = new 
DefaultJmsListenerContainerFactory();
jmsListenerContainerFactory.setConnectionFactory(jmsConnectionFactory);
jmsListenerContainerFactory.setPubSubDomain(true);
jmsListenerContainerFactory.setSubscriptionDurable(true);
jmsListenerContainerFactory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return jmsListenerContainerFactory;{code}
And Spring-JMS Listeners:
{code:java}
@Component
public class MyListener {
@Autowired
JMSSupport jmsSupport;
 

[jira] [Created] (QPIDJMS-392) Stalled Service Bus Subscriptions, Reconnect Loops

2018-06-12 Thread Paul Rogalinski (JIRA)
Paul Rogalinski created QPIDJMS-392:
---

 Summary: Stalled Service Bus Subscriptions, Reconnect Loops
 Key: QPIDJMS-392
 URL: https://issues.apache.org/jira/browse/QPIDJMS-392
 Project: Qpid JMS
  Issue Type: Bug
  Components: qpid-jms-client
Affects Versions: 0.32.0
 Environment: * Azure App Service
 * Spring 1.5.12 via embedded web server / standalone
 * java.runtime.version "1.8.0_144-b01"
 * java.vm.version "25.144-b01"
 * java.vm.vendor "Azul Systems, Inc."
 * os.arch "amd64"
 * os.name "Windows Server 2016"
Reporter: Paul Rogalinski
 Attachments: qpid-log.txt

We are experiencing connection recovery loops after an random (hours to days) 
amount of time on JMS connection subscribed to a Service Bus Topic via 
(durable) Subscription. The application does connect to two different service 
bus instances at the same time. 

We had to use the failover:(amqp:...) URL pattern in order to actually see 
something in the log files as the FailoverProvider is a bit more verbose (TRACE 
level). When using plain ampqs:// connection URL pattern the only way to tell a 
hanging/stalled subscription was to monitor the number of active/unconsumed 
Messages in a Service Bus' Subscription using a different client. Regardless of 
the Provider implementation, the problem of stalled subscriptions persisted.

Log observations so far:

After an initial disconnect:

 
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
CONNECTION_REMOTE_CLOSE
(...)
Failover: the provider reports failure: The connection was inactive for more 
than the allowed 30 milliseconds and is closed by container 'LinkTracker'. 
TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, SystemTracker:gateway9, 
Timestamp:5/1/2018 6:11:44 PM [condition = amqp:connection:forced]
{code}
the application will enter into a reconnect/connection recovery loop with an 
interval of one minute.

 

 
{code:java}
18:11:44.670Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 30 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:c2925ed17c9549b29a825cc581852d1a_G6, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:11:44 PM [condition = 
amqp:connection:forced]
18:12:44.789Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:50b008fdb56f443b947d76ded87691a9_G5, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:12:44 PM [condition = 
amqp:connection:forced]
18:13:44.918Z Failover: the provider reports failure: The connection was 
inactive for more than the allowed 6 milliseconds and is closed by 
container 'LinkTracker'. TrackingId:a2832475e8ab4702b6189d9218588132_G15, 
SystemTracker:gateway9, Timestamp:5/1/2018 6:13:44 PM [condition = 
amqp:connection:forced]
{code}
In between we see the connection being recovered along with log messages as:
{code:java}
org.apache.qpid.jms.JmsConnection    : Connection 
ID:8061906b-13e9-46a3-b6d0-fd227b339f67:1 is finalizing recovery
{code}
followed by
{code:java}
o.a.qpid.jms.provider.amqp.AmqpProvider  : New Proton Event: 
SESSION_REMOTE_CLOSE
{code}
after a minute.

A full cycle is attached in the log file, URLs are anonymized.

In general we can see the following sequence of Proton-Events repeating:

 

 
{code:java}
23:58:53.676Z CONNECTION_INIT
23:58:53.722Z CONNECTION_LOCAL_OPEN
23:58:53.722Z CONNECTION_REMOTE_OPEN
23:58:53.722Z SESSION_INIT
23:58:53.722Z SESSION_LOCAL_OPEN
23:58:53.738Z SESSION_REMOTE_OPEN
23:59:53.690Z SESSION_REMOTE_CLOSE
23:59:53.690Z SESSION_LOCAL_CLOSE
23:59:53.690Z SESSION_FINAL
23:59:53.706Z CONNECTION_REMOTE_CLOSE
23:59:53.706Z TRANSPORT_TAIL_CLOSED
23:59:53.706Z CONNECTION_LOCAL_CLOSE
23:59:53.706Z TRANSPORT_HEAD_CLOSED
23:59:53.706Z TRANSPORT_CLOSED
{code}
We did also open a ticket with Microsoft / Azure as the disconnects seemed to 
originate from the Service Bus instance (SESSION_REMOTE_CLOSE) where we did 
learn, that the Service Bus has a hardwired TTL of 5 minutes before inactive 
Subscriptions will be terminated. According to Azure-Support the issue is most 
likely related to Qpid connection handling. This does at least explain the 
first / initial timeout of 30 seen in the logs. The connection / client is 
set up as follows

URLs:

 
{code:java}
failover:(amqps://instance1.servicebus.windows.net)?amqp.idleTimeout=12000
failover:(amqps://instance2.servicebus.windows.net)?amqp.idleTimeout=12000
{code}
Connection Factories:

 

 
{code:java}
JmsConnectionFactory qpidConnectionFactory = new 
JmsConnectionFactory(sbUsername, sbPassword, sbAmqpRemoteUri);
 
qpidConnectionFactory.setReceiveLocalOnly(true);
qpidConnectionFactory.setClientID(UUID.randomUUID().toString()); {code}
 

 

JmsListenerContainerFactories:
{code:java}
DefaultJmsListenerContainerFactory