[jira] [Updated] (ARTEMIS-2180) Host and broker runs out of memory when stopping a backup in a cluster

2018-11-21 Thread Simon Chalmers (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Chalmers updated ARTEMIS-2180:

Description: 
When running a live-backup cluster pair and stopping the backup, the live 
broker starts leaking memory, runs out of memory and core dumps.

This occurs during a performance test when the broker is under load; the slave 
has been running successfully but is then terminated whilst the broker is still 
under load.

Core dump attached.



  was:
When running a live-backup cluster pair and stopping the backup, the live 
broker starts leaking memory, runs of memory and core dumps.

Core dump attached.




> Host and broker runs out of memory when stopping a backup in a cluster
> --
>
> Key: ARTEMIS-2180
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2180
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: AMQP
>Affects Versions: 2.6.3
>Reporter: Simon Chalmers
>Priority: Critical
> Attachments: hs_err_pid15534.log
>
>
> When running a live-backup cluster pair and stopping the backup, the live 
> broker starts leaking memory, runs out of memory and core dumps.
> This occurs during a performance test when the broker is under load; the 
> slave has been running successfully but is then terminated whilst the broker 
> is still under load.
> Core dump attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARTEMIS-2180) Host and broker runs out of memory when stopping a backup in a cluster

2018-11-21 Thread Simon Chalmers (JIRA)
Simon Chalmers created ARTEMIS-2180:
---

 Summary: Host and broker runs out of memory when stopping a backup 
in a cluster
 Key: ARTEMIS-2180
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2180
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: AMQP
Affects Versions: 2.6.3
Reporter: Simon Chalmers
 Attachments: hs_err_pid15534.log

When running a live-backup cluster pair and stopping the backup, the live 
broker starts leaking memory, runs of memory and core dumps.

Core dump attached.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7098) Need enhacement for the existing queue

2018-11-21 Thread Bala Subramanyam Mangipudi (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695575#comment-16695575
 ] 

Bala Subramanyam Mangipudi commented on AMQ-7098:
-

Please suggest us a solution.

> Need enhacement for the existing queue
> --
>
> Key: AMQ-7098
> URL: https://issues.apache.org/jira/browse/AMQ-7098
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: activemq-camel
>Affects Versions: 5.14.0
>Reporter: Bala Subramanyam Mangipudi
>Priority: Minor
> Attachments: Requirements.docx
>
>
> Please see the requirements document attached



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AMQ-7104) java.io.IOException: Input/output error on ActiveMQ

2018-11-21 Thread Siddardha (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddardha reopened AMQ-7104:


Hi Christopher/Team,

We are using SMB for shared storage. Is it supported for AMQ?

Regards,

K.R.Siddardha

> java.io.IOException: Input/output error on ActiveMQ
> ---
>
> Key: AMQ-7104
> URL: https://issues.apache.org/jira/browse/AMQ-7104
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.3
>Reporter: Siddardha
>Priority: Major
> Fix For: 5.15.3
>
> Attachments: activemq.log
>
>
> There is an error related to 'java.io.IOException: Input/output error ' on 
> Active MQ Cluster Setup (Master/Slave) which is stopping the BrokerService. 
> Also, It is not failing over the service to other node due to the same.
> Active MQ Version: apache-activemq-5.15.3
> Java Version: 1.8.0_181
> OS: RHEL 7.4
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activem

2018-11-21 Thread Timothy Bish (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695443#comment-16695443
 ] 

Timothy Bish commented on AMQ-7108:
---

You need to test using the latest release (5.15.8) as your version is quite old 
and is not supported.  It's rather likely this issue has already been fixed in 
a newer release. 

> Scheduled job scheduling occasionally fails with 
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> 
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Job Scheduler, KahaDB
>Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu 
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>  
>Reporter: Ben D'Herville
>Priority: Minor
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in 
> this ticket's environment details for over a year.  Recently we have 
> experienced ActiveMQ outages as a result of the exception below.  There has 
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow 
> system.  A workflow will execute a step then schedule a small object message, 
> containing just an identifier, to check for a result in a number of seconds.  
> The consumer of the message will schedule another message if the task is not 
> complete yet.  We also have a batch process that can start a number of 
> workflows concurrently.  Usually in the order of 100's which means 100's of 
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a 
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the 
> broker.  kahadb is configured to check for corrupt journal which would 
> recover the journal on startup.  It is unclear if the journal is corrupted 
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven 
> plugin but was unable to.  We had 1 producers concurrently schedule 500 
> messages each.  
>  
> Note there are two stack traces here.  The second one is possibly a 
> side-effect of the first one.
> {code:java}
> 2018-Nov-21 15:16:48 JobScheduler:JMS 
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS 
> Failed to schedule job
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
> at java.lang.Thread.run(Thread.java:748)
> 2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:
> java.lang.NullPointerException
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
> at 
> org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
> at 
> org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
> at 
> 

[jira] [Commented] (ARTEMIS-2178) Support routing-type configuration on core bridge

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695373#comment-16695373
 ] 

ASF GitHub Bot commented on ARTEMIS-2178:
-

Github user jbertram commented on the issue:

https://github.com/apache/activemq-artemis/pull/2438
  
I thought I had added docs, but I must have mistaken the divert docs for 
this.  I added a section in the docs for this.


> Support routing-type configuration on core bridge
> -
>
> Key: ARTEMIS-2178
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2178
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Justin Bertram
>Assignee: Justin Bertram
>Priority: Major
>
> {{MULTICAST}} messages forwarded by a core bridge will not be routed to any 
> {{ANYCAST}} queues and vice-versa.  Diverts have the ability to configure how 
> routing-type is treated.  Core bridges should support this same kind of 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2178) Support routing-type configuration on core bridge

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695372#comment-16695372
 ] 

ASF GitHub Bot commented on ARTEMIS-2178:
-

Github user jbertram commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2438#discussion_r235566861
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/BridgeConfiguration.java
 ---
@@ -75,6 +76,8 @@
// The bridge shouldn't be sending blocking anyways
private long callTimeout = ActiveMQClient.DEFAULT_CALL_TIMEOUT;
 
+   private ComponentConfigurationRoutingType routingType = 
ComponentConfigurationRoutingType.valueOf(ActiveMQDefaultConfiguration.getDefaultDivertRoutingType());
--- End diff --

Nice catch. Fixed.


> Support routing-type configuration on core bridge
> -
>
> Key: ARTEMIS-2178
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2178
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Justin Bertram
>Assignee: Justin Bertram
>Priority: Major
>
> {{MULTICAST}} messages forwarded by a core bridge will not be routed to any 
> {{ANYCAST}} queues and vice-versa.  Diverts have the ability to configure how 
> routing-type is treated.  Core bridges should support this same kind of 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activem

2018-11-21 Thread Ben D'Herville (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695340#comment-16695340
 ] 

Ben D'Herville commented on AMQ-7108:
-

Just a thought and I haven't looked into the activemq code in detail but it 
seems like the data kahadb is reading in is incorrect which may indicate that a 
race condition occurred writing the data to disk.  i.e. is it possible that a 
KahaAddScheduledJobCommand and a KahaTraceCommand were stored at the same time 
and were given the same primary key?

> Scheduled job scheduling occasionally fails with 
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> 
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Job Scheduler, KahaDB
>Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu 
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>  
>Reporter: Ben D'Herville
>Priority: Major
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in 
> this ticket's environment details for over a year.  Recently we have 
> experienced ActiveMQ outages as a result of the exception below.  There has 
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow 
> system.  A workflow will execute a step then schedule a small object message, 
> containing just an identifier, to check for a result in a number of seconds.  
> The consumer of the message will schedule another message if the task is not 
> complete yet.  We also have a batch process that can start a number of 
> workflows concurrently.  Usually in the order of 100's which means 100's of 
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a 
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the 
> broker.  kahadb is configured to check for corrupt journal which would 
> recover the journal on startup.  It is unclear if the journal is corrupted 
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven 
> plugin but was unable to.  We had 1 producers concurrently schedule 500 
> messages each.  
>  
> Note there are two stack traces here.  The second one is possibly a 
> side-effect of the first one.
> {code:java}
> 2018-Nov-21 15:16:48 JobScheduler:JMS 
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS 
> Failed to schedule job
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
> at java.lang.Thread.run(Thread.java:748)
> 2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:
> java.lang.NullPointerException
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
> at 
> 

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben D'Herville updated AMQ-7108:

Attachment: activemq.xml

> Scheduled job scheduling occasionally fails with 
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> 
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Job Scheduler, KahaDB
>Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu 
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>  
>Reporter: Ben D'Herville
>Priority: Major
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in 
> this ticket's environment details for over a year.  Recently we have 
> experienced ActiveMQ outages as a result of the exception below.  There has 
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow 
> system.  A workflow will execute a step then schedule a small object message, 
> containing just an identifier, to check for a result in a number of seconds.  
> The consumer of the message will schedule another message if the task is not 
> complete yet.  We also have a batch process that can start a number of 
> workflows concurrently.  Usually in the order of 100's which means 100's of 
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a 
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the 
> broker.  kahadb is configured to check for corrupt journal which would 
> recover the journal on startup.  It is unclear if the journal is corrupted 
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven 
> plugin but was unable to.  We had 1 producers concurrently schedule 500 
> messages each.  
>  
> Note there are two stack traces here.  The second one is possibly a 
> side-effect of the first one.
> 2018-Nov-21 15:16:48 JobScheduler:JMS 
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS 
> Failed to schedule job
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
> at java.lang.Thread.run(Thread.java:748)
> 2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT 
> FAILED:java.lang.NullPointerException
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
> at 
> org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
> at 
> org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
> at 
> org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
> at 
> 

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben D'Herville updated AMQ-7108:

Description: 
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a 
complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the 
broker.  kahadb is configured to check for corrupt journal which would recover 
the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven 
plugin but was unable to.  We had 1 producers concurrently schedule 500 
messages each.  

 

{{Note there are two stack traces here.  The second one is possibly a 
side-effect of the first one.}}
{code:java}
2018-Nov-21 15:16:48 JobScheduler:JMS 
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed 
to schedule job
java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
at java.lang.Thread.run(Thread.java:748)

2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:
java.lang.NullPointerException
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
at 
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
at 
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
at 
org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
at 
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
at 
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
at 
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253){code}

  was:
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's 

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben D'Herville updated AMQ-7108:

Description: 
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a 
complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the 
broker.  kahadb is configured to check for corrupt journal which would recover 
the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven 
plugin but was unable to.  We had 1 producers concurrently schedule 500 
messages each.  

 
Note there are two stack traces here.  The second one is possibly a side-effect 
of the first one.
{code:java}
2018-Nov-21 15:16:48 JobScheduler:JMS 
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed 
to schedule job
java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
at java.lang.Thread.run(Thread.java:748)

2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:
java.lang.NullPointerException
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
at 
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
at 
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
at 
org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
at 
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
at 
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
at 
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253){code}

  was:
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of 

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben D'Herville updated AMQ-7108:

Description: 
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a 
complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the 
broker.  kahadb is configured to check for corrupt journal which would recover 
the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven 
plugin but was unable to.  We had 1 producers concurrently schedule 500 
messages each.  

 

{{Note there are two stack traces here.  The second one is possibly a 
side-effect of the first one.}}
 {{ 2018-Nov-21 15:16:48 JobScheduler:JMS 
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed 
to schedule job}}
 \{{ java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
 \{{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ 
Transport: tcp:///10.88.2.113:54256@61616 
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT 
FAILED:java.lang.NullPointerException}}
 \{{ at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
 \{{ at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
 \{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
 \{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
 \{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
 \{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
 \{{ at 
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
 \{{ at 
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
 \{{ at 
org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
 \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
 \{{ at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
 \{{ at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
 \{{ at 
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
 \{{ at 
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
 \{{ at 
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}

  was:
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages 

[jira] [Commented] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activem

2018-11-21 Thread Ben D'Herville (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695334#comment-16695334
 ] 

Ben D'Herville commented on AMQ-7108:
-

I've attached our broker configuration.

> Scheduled job scheduling occasionally fails with 
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> 
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Job Scheduler, KahaDB
>Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu 
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>  
>Reporter: Ben D'Herville
>Priority: Major
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in 
> this ticket's environment details for over a year.  Recently we have 
> experienced ActiveMQ outages as a result of the exception below.  There has 
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow 
> system.  A workflow will execute a step then schedule a small object message, 
> containing just an identifier, to check for a result in a number of seconds.  
> The consumer of the message will schedule another message if the task is not 
> complete yet.  We also have a batch process that can start a number of 
> workflows concurrently.  Usually in the order of 100's which means 100's of 
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a 
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the 
> broker.  kahadb is configured to check for corrupt journal which would 
> recover the journal on startup.  It is unclear if the journal is corrupted 
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven 
> plugin but was unable to.  We had 1 producers concurrently schedule 500 
> messages each.  
>  
> Note there are two stack traces here.  The second one is possibly a 
> side-effect of the first one.
> 2018-Nov-21 15:16:48 JobScheduler:JMS 
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS 
> Failed to schedule job
> java.lang.ClassCastException: 
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
> at java.lang.Thread.run(Thread.java:748)
> 2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT 
> FAILED:java.lang.NullPointerException
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
> at 
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
> at 
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
> at 
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
> at 
> org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
> at 
> org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
> at 
> org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> 

[jira] [Updated] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben D'Herville updated AMQ-7108:

Description: 
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a 
complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the 
broker.  kahadb is configured to check for corrupt journal which would recover 
the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven 
plugin but was unable to.  We had 1 producers concurrently schedule 500 
messages each.  

 

{{Note there are two stack traces here.  The second one is possibly a 
side-effect of the first one.}}
{{ 2018-Nov-21 15:16:48 JobScheduler:JMS 
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed 
to schedule job}}
{{ java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
{{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ 
Transport: tcp:///10.88.2.113:54256@61616 
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT 
FAILED:java.lang.NullPointerException}}
{{ at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
{{ at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
{{ at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
{{ at 
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
{{ at 
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
{{ at 
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
{{ at org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
{{ at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
{{ at 
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
{{ at 
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
{{ at 
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}

  was:
*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent 

[jira] [Created] (AMQ-7108) Scheduled job scheduling occasionally fails with java.lang.ClassCastException: org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to org.apache.activemq.

2018-11-21 Thread Ben D'Herville (JIRA)
Ben D'Herville created AMQ-7108:
---

 Summary: Scheduled job scheduling occasionally fails with 
java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
 Key: AMQ-7108
 URL: https://issues.apache.org/jira/browse/AMQ-7108
 Project: ActiveMQ
  Issue Type: Bug
  Components: Job Scheduler, KahaDB
Affects Versions: 5.11.1
 Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu 
SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

*AMQ*: 5.11.1

*glusterfs*: 3.8.15

*java*: 1.8.0_144-b01

 
Reporter: Ben D'Herville


*Issue*

We have been running the stack with the same stated versions mentioned in this 
ticket's environment details for over a year.  Recently we have experienced 
ActiveMQ outages as a result of the exception below.  There has been a increase 
in load but not significant.

Scheduled messages are being used as a polling mechanism in a workflow system.  
A workflow will execute a step then schedule a small object message, containing 
just an identifier, to check for a result in a number of seconds.  The consumer 
of the message will schedule another message if the task is not complete yet.  
We also have a batch process that can start a number of workflows concurrently. 
 Usually in the order of 100's which means 100's of scheduled messages can be 
created concurrently.

The most recent failure occurred with a batch of only 64.

We haven't noticed any data loss as a result of this but it doesn't cause a 
complete outage of our production system.

*Recovery*

The only way we have been able to recover from this error is to restart the 
broker.  kahadb is configured to check for corrupt journal which would recover 
the journal on startup.  It is unclear if the journal is corrupted though.

*Reproduction*

We have tried to reproduce this issue using the ActiveMQ performance maven 
plugin but was unable to.  We had 1 producers concurrently schedule 500 
messages each.  

 

Note there are two stack traces here.  The second one is possibly a side-effect 
of the first one.
2018-Nov-21 15:16:48 JobScheduler:JMS 
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed 
to schedule job
java.lang.ClassCastException: 
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to 
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
at java.lang.Thread.run(Thread.java:748)

2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616 
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT 
FAILED:java.lang.NullPointerException
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
at 
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
at 
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
at 
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
at 
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
at 
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
at 
org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
at 
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
at 
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
at 
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
at 
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7107) QueueBrowsingTest and UsageBlockedDispatchTest are failing with ConcurrentStoreAndDispachQueues=false

2018-11-21 Thread Jamie goodyear (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695285#comment-16695285
 ] 

Jamie goodyear commented on AMQ-7107:
-

Thank you for the patch submission,

I'll take a gander.

> QueueBrowsingTest and UsageBlockedDispatchTest are failing with 
> ConcurrentStoreAndDispachQueues=false
> -
>
> Key: AMQ-7107
> URL: https://issues.apache.org/jira/browse/AMQ-7107
> Project: ActiveMQ
>  Issue Type: Test
>Reporter: Alan Protasio
>Priority: Minor
>
> Hi,
> I was working towards https://issues.apache.org/jira/browse/AMQ-7028 and 
> after my change QueueBrowsingTest and UsageBlockedDispatchTest were failing.
> QueueBrowsingTest was changed by 
> https://issues.apache.org/jira/browse/AMQ-4495 and it is testing if a full 
> page was pagedIn by the cursor.
> The problem is because this test was only succeeding due how 
> ConcurrentStoreAndDispachQueues=true is implemented. When this flag is set to 
> True, we increase the memory usage when start the async task and when 
> decrease it when the task is done:
> https://github.com/alanprot/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/Queue.java#L897
> So, imagine this timeline:
> 1 . Send message 1
> 2. The cursor get full and the cache is disabled
> 3. Message1 finish and the memory is freed
> 4. messages 2 to 100 are sent and the cache is skipped
> 5. We call browser queue and the cursor can pageIn messages because the 
> cursorMemory is not full
> Now with ConcurrentStoreAndDispachQueues=false
> 1 . Send message 1
> 2 . Send message 2
> 3. The cursor get full and the cache is disabled
> 4. messages 3 to 100 are sent and the cache is skipped (memory still full)
> 5. We call browser queue and the cursor cannot pageIn messages because the 
> cursorMemory is full
> So, in order to make this test work with 
> ConcurrentStoreAndDispachQueues=false i did a simple change on it... After 
> sending all the messages, consume one of them and make sure that the cursor 
> has memory to pageIn messages.
> Similar thing is happening with UsageBlockedDispatchTest.
> I create 2 more test to do the same test with 
> ConcurrentStoreAndDispachQueues=false and changed it a little bit to make 
> them works with this flag false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7107) QueueBrowsingTest and UsageBlockedDispatchTest are failing with ConcurrentStoreAndDispachQueues=false

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695263#comment-16695263
 ] 

ASF GitHub Bot commented on AMQ-7107:
-

GitHub user alanprot opened a pull request:

https://github.com/apache/activemq/pull/318

AMQ-7107 - Make QueueBrowsingTest and UsageBlockedDispatchTest succee…

…d with ConcurrentStoreAndDispachQueues=false

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanprot/activemq AMQ-7107

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq/pull/318.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #318


commit 2eda327abac8bd092e094b313314682cd9269e36
Author: Alan Protasio 
Date:   2018-11-21T19:01:19Z

AMQ-7107 - Make QueueBrowsingTest and UsageBlockedDispatchTest succeed with 
ConcurrentStoreAndDispachQueues=false




> QueueBrowsingTest and UsageBlockedDispatchTest are failing with 
> ConcurrentStoreAndDispachQueues=false
> -
>
> Key: AMQ-7107
> URL: https://issues.apache.org/jira/browse/AMQ-7107
> Project: ActiveMQ
>  Issue Type: Test
>Reporter: Alan Protasio
>Priority: Minor
>
> Hi,
> I was working towards https://issues.apache.org/jira/browse/AMQ-7028 and 
> after my change QueueBrowsingTest and UsageBlockedDispatchTest were failing.
> QueueBrowsingTest was changed by 
> https://issues.apache.org/jira/browse/AMQ-4495 and it is testing if a full 
> page was pagedIn by the cursor.
> The problem is because this test was only succeeding due how 
> ConcurrentStoreAndDispachQueues=true is implemented. When this flag is set to 
> True, we increase the memory usage when start the async task and when 
> decrease it when the task is done:
> https://github.com/alanprot/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/Queue.java#L897
> So, imagine this timeline:
> 1 . Send message 1
> 2. The cursor get full and the cache is disabled
> 3. Message1 finish and the memory is freed
> 4. messages 2 to 100 are sent and the cache is skipped
> 5. We call browser queue and the cursor can pageIn messages because the 
> cursorMemory is not full
> Now with ConcurrentStoreAndDispachQueues=false
> 1 . Send message 1
> 2 . Send message 2
> 3. The cursor get full and the cache is disabled
> 4. messages 3 to 100 are sent and the cache is skipped (memory still full)
> 5. We call browser queue and the cursor cannot pageIn messages because the 
> cursorMemory is full
> So, in order to make this test work with 
> ConcurrentStoreAndDispachQueues=false i did a simple change on it... After 
> sending all the messages, consume one of them and make sure that the cursor 
> has memory to pageIn messages.
> Similar thing is happening with UsageBlockedDispatchTest.
> I create 2 more test to do the same test with 
> ConcurrentStoreAndDispachQueues=false and changed it a little bit to make 
> them works with this flag false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7107) QueueBrowsingTest and UsageBlockedDispatchTest are failing with ConcurrentStoreAndDispachQueues=false

2018-11-21 Thread Alan Protasio (JIRA)
Alan Protasio created AMQ-7107:
--

 Summary: QueueBrowsingTest and UsageBlockedDispatchTest are 
failing with ConcurrentStoreAndDispachQueues=false
 Key: AMQ-7107
 URL: https://issues.apache.org/jira/browse/AMQ-7107
 Project: ActiveMQ
  Issue Type: Test
Reporter: Alan Protasio


Hi,

I was working towards https://issues.apache.org/jira/browse/AMQ-7028 and after 
my change QueueBrowsingTest and UsageBlockedDispatchTest were failing.

QueueBrowsingTest was changed by https://issues.apache.org/jira/browse/AMQ-4495 
and it is testing if a full page was pagedIn by the cursor.
The problem is because this test was only succeeding due how 
ConcurrentStoreAndDispachQueues=true is implemented. When this flag is set to 
True, we increase the memory usage when start the async task and when decrease 
it when the task is done:


https://github.com/alanprot/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/Queue.java#L897

So, imagine this timeline:

1 . Send message 1

2. The cursor get full and the cache is disabled

3. Message1 finish and the memory is freed

4. messages 2 to 100 are sent and the cache is skipped

5. We call browser queue and the cursor can pageIn messages because the 
cursorMemory is not full

Now with ConcurrentStoreAndDispachQueues=false

1 . Send message 1

2 . Send message 2

3. The cursor get full and the cache is disabled

4. messages 3 to 100 are sent and the cache is skipped (memory still full)

5. We call browser queue and the cursor cannot pageIn messages because the 
cursorMemory is full

So, in order to make this test work with ConcurrentStoreAndDispachQueues=false 
i did a simple change on it... After sending all the messages, consume one of 
them and make sure that the cursor has memory to pageIn messages.

Similar thing is happening with UsageBlockedDispatchTest.

I create 2 more test to do the same test with 
ConcurrentStoreAndDispachQueues=false and changed it a little bit to make them 
works with this flag false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1828) Add option in CLI to specify a routing type for queues

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695256#comment-16695256
 ] 

ASF GitHub Bot commented on ARTEMIS-1828:
-

Github user rakstar commented on the issue:

https://github.com/apache/activemq-artemis/pull/2038
  
@michaelandrepearce no problem at all. Thanks for accepting my patch! :)


> Add option in CLI to specify a routing type for queues
> --
>
> Key: ARTEMIS-1828
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1828
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Justin Bertram
>Priority: Major
> Fix For: 2.6.0
>
>
> Optionally specify a queue routing type when creating a broker, e.g.:
> {noformat}
> create --queues myqueue,mytopic:multicast
> {noformat}
> Defaults to anycast if unspecified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2178) Support routing-type configuration on core bridge

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695225#comment-16695225
 ] 

ASF GitHub Bot commented on ARTEMIS-2178:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2438
  
This looks like a nice new feature, some docs maybe?

Does this mean you could have a bridge that takes a message on an anycast 
queue, and send it to a multicast queue if the bridge is set routingtype to 
ComponentConfigurationRoutingType.MULTICAST?




> Support routing-type configuration on core bridge
> -
>
> Key: ARTEMIS-2178
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2178
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Justin Bertram
>Assignee: Justin Bertram
>Priority: Major
>
> {{MULTICAST}} messages forwarded by a core bridge will not be routed to any 
> {{ANYCAST}} queues and vice-versa.  Diverts have the ability to configure how 
> routing-type is treated.  Core bridges should support this same kind of 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2178) Support routing-type configuration on core bridge

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695224#comment-16695224
 ] 

ASF GitHub Bot commented on ARTEMIS-2178:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2438#discussion_r235527477
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/config/BridgeConfiguration.java
 ---
@@ -75,6 +76,8 @@
// The bridge shouldn't be sending blocking anyways
private long callTimeout = ActiveMQClient.DEFAULT_CALL_TIMEOUT;
 
+   private ComponentConfigurationRoutingType routingType = 
ComponentConfigurationRoutingType.valueOf(ActiveMQDefaultConfiguration.getDefaultDivertRoutingType());
--- End diff --

Should this not be default bridge routing type? (looks like picking up the 
divert default)


> Support routing-type configuration on core bridge
> -
>
> Key: ARTEMIS-2178
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2178
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Justin Bertram
>Assignee: Justin Bertram
>Priority: Major
>
> {{MULTICAST}} messages forwarded by a core bridge will not be routed to any 
> {{ANYCAST}} queues and vice-versa.  Diverts have the ability to configure how 
> routing-type is treated.  Core bridges should support this same kind of 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2179) Add management method to get cluster-connection names

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695216#comment-16695216
 ] 

ASF GitHub Bot commented on ARTEMIS-2179:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2439#discussion_r235525451
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/management/impl/ActiveMQServerControlImpl.java
 ---
@@ -950,6 +950,24 @@ public String updateQueue(String name,
   }
}
 
+   @Override
+   public String[] getClusterConnectionNames() {
+  checkStarted();
+
+  clearIO();
+  try {
+ List names = new ArrayList<>();
--- End diff --

using java 8 lambda's could be

``` 
return 
server.getClusterManager().getClusterConnections().stream().map(ClusterConnection::getName).toArray(String[]::new);
```



> Add management method to get cluster-connection names
> -
>
> Key: ARTEMIS-2179
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2179
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Justin Bertram
>Assignee: Justin Bertram
>Priority: Major
>
> To invoke management operations on many resources (e.g. queues, addresses, 
> bridges, diverts, cluster-connections, etc.) you need to know the _name_ of 
> the resource. Many of these resource types have management methods to get 
> their names (e.g. {{ActiveMQServerControl}} has {{getQueueNames()}}, 
> {{getAddressNames()}}, {{getDivertNames()}}, {{getBridgeNames}}, etc.).  
> However, cluster-connections do not have such a management method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1828) Add option in CLI to specify a routing type for queues

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695212#comment-16695212
 ] 

ASF GitHub Bot commented on ARTEMIS-1828:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2038
  
@rakstar thanks for the contribution, and again thanks for the patience.


> Add option in CLI to specify a routing type for queues
> --
>
> Key: ARTEMIS-1828
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1828
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Justin Bertram
>Priority: Major
> Fix For: 2.6.0
>
>
> Optionally specify a queue routing type when creating a broker, e.g.:
> {noformat}
> create --queues myqueue,mytopic:multicast
> {noformat}
> Defaults to anycast if unspecified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1828) Add option in CLI to specify a routing type for queues

2018-11-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695207#comment-16695207
 ] 

ASF subversion and git services commented on ARTEMIS-1828:
--

Commit 92fcff5ff48d12cf6d951213172515a995ef7d82 in activemq-artemis's branch 
refs/heads/master from King Ramos
[ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=92fcff5 ]

ARTEMIS-1828 CLI option for queue's routing-type

Optionally specify a queue routing type when creating a broker. Example:
"create --queues myqueue,mytopic:multicast". Defaults to anycast if
unspecified.


> Add option in CLI to specify a routing type for queues
> --
>
> Key: ARTEMIS-1828
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1828
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Justin Bertram
>Priority: Major
> Fix For: 2.6.0
>
>
> Optionally specify a queue routing type when creating a broker, e.g.:
> {noformat}
> create --queues myqueue,mytopic:multicast
> {noformat}
> Defaults to anycast if unspecified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1828) Add option in CLI to specify a routing type for queues

2018-11-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695209#comment-16695209
 ] 

ASF GitHub Bot commented on ARTEMIS-1828:
-

Github user asfgit closed the pull request at:

https://github.com/apache/activemq-artemis/pull/2038


> Add option in CLI to specify a routing type for queues
> --
>
> Key: ARTEMIS-1828
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1828
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.5.0
>Reporter: Justin Bertram
>Priority: Major
> Fix For: 2.6.0
>
>
> Optionally specify a queue routing type when creating a broker, e.g.:
> {noformat}
> create --queues myqueue,mytopic:multicast
> {noformat}
> Defaults to anycast if unspecified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7106.
-
Resolution: Fixed

The same fix pattern could be applied to the stop/stopping choreography however 
that problem goes deeper due to the interaction with the service stopper state 
etc. It gets very involved quickly. This is a small step with specific intent 
and scope.

> TransportConnection pendingStop support during start is broken
> --
>
> Key: AMQ-7106
> URL: https://issues.apache.org/jira/browse/AMQ-7106
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Transport
>Affects Versions: 5.15.0
> Environment: mqtt nio ssl
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> If start blocks, the inactivity monitor can kick in. The intent is that it 
> can see the starting state and initiate a pending stop and complete.
>  The current synchronization breaks this b/c the start code holds a lock for 
> its duration, which requires the stopAsync code to block in error.
>  This means there is a blocked inactivity monitor thread per blocked starting 
> connection...  very quickly too many threads.
>  A blocked ssl handshake can demonstrate.
> {code:java}
> "MQTTInactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
> queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d 
> waiting for monitor entry [0x7fd393776000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
> - waiting to lock <0x0004dc8059b0> (a 
> org.apache.activemq.broker.jmx.ManagedTransportConnection)
> at 
> org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
> at 
> org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
> at 
> org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
> at 
> org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
> at 
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
> at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
> at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x0004d7803c60> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
> {code:java}
> "ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
> tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0004dc805830> (a sun.nio.ch.Util$3)
> - locked <0x0004dc805820> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
> at 
> org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
> at 
> org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
> at 
> org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
> at 
> org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
> at 
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
> at 
> org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
> at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start(MQTTInactivityMonitor.java:148)
> at 
> 

[jira] [Commented] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694528#comment-16694528
 ] 

ASF subversion and git services commented on AMQ-7106:
--

Commit 8cc0c5ad6c85381cf6bbeaf179086d451d96650e in activemq's branch 
refs/heads/master from gtully
[ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=8cc0c5a ]

AMQ-7106 - fix pending stop support by avoiding sync through single shared 
status var - fix and test


> TransportConnection pendingStop support during start is broken
> --
>
> Key: AMQ-7106
> URL: https://issues.apache.org/jira/browse/AMQ-7106
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Transport
>Affects Versions: 5.15.0
> Environment: mqtt nio ssl
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> If start blocks, the inactivity monitor can kick in. The intent is that it 
> can see the starting state and initiate a pending stop and complete.
>  The current synchronization breaks this b/c the start code holds a lock for 
> its duration, which requires the stopAsync code to block in error.
>  This means there is a blocked inactivity monitor thread per blocked starting 
> connection...  very quickly too many threads.
>  A blocked ssl handshake can demonstrate.
> {code:java}
> "MQTTInactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
> queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d 
> waiting for monitor entry [0x7fd393776000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
> - waiting to lock <0x0004dc8059b0> (a 
> org.apache.activemq.broker.jmx.ManagedTransportConnection)
> at 
> org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
> at 
> org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
> at 
> org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
> at 
> org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
> at 
> org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
> at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
> at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - <0x0004d7803c60> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> {code}
> {code:java}
> "ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
> tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0004dc805830> (a sun.nio.ch.Util$3)
> - locked <0x0004dc805820> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
> at 
> org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
> at 
> org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
> at 
> org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
> at 
> org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
> at 
> org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
> at 
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
> at 
> org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
> at 
> 

[jira] [Updated] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7106:

Description: 
If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
 The current synchronization breaks this b/c the start code holds a lock for 
its duration, which requires the stopAsync code to block in error.
 This means there is a blocked inactivity monitor thread per blocked starting 
connection...  very quickly too many threads.
 A blocked ssl handshake can demonstrate.
{code:java}
"MQTTInactivityMonitor Async Task: 
java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d waiting 
for monitor entry [0x7fd393776000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
- waiting to lock <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
at 
org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
at 
org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
at 
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004d7803c60> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
{code:java}
"ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0004dc805830> (a sun.nio.ch.Util$3)
- locked <0x0004dc805820> (a java.util.Collections$UnmodifiableSet)
- locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
at 
org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start(MQTTInactivityMonitor.java:148)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.broker.TransportConnection.start(TransportConnection.java:1066)
- locked <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.TransportConnector$1$1.run(TransportConnector.java:218)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004dc805a78> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){code}

  was:
If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
The current synchronization breaks tis b/c the start code holds a lock for its 
duration and the which 

[jira] [Created] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7106:
---

 Summary: TransportConnection pendingStop support during start is 
broken
 Key: AMQ-7106
 URL: https://issues.apache.org/jira/browse/AMQ-7106
 Project: ActiveMQ
  Issue Type: Bug
  Components: Transport
Affects Versions: 5.15.0
 Environment: mqtt nio ssl
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
The current synchronization breaks tis b/c the start code holds a lock for its 
duration and the which requires the stopAsync code to block in error.
This means there is a blocked inactivity monitor thread per blocked starting 
connection.
A blocked ssl handshake can demonstrate.

{code}
"MQTTInactivityMonitor Async Task: 
java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d waiting 
for monitor entry [0x7fd393776000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
- waiting to lock <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
at 
org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
at 
org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
at 
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004d7803c60> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
{code}"ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0004dc805830> (a sun.nio.ch.Util$3)
- locked <0x0004dc805820> (a java.util.Collections$UnmodifiableSet)
- locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
at 
org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start(MQTTInactivityMonitor.java:148)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.broker.TransportConnection.start(TransportConnection.java:1066)
- locked <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.TransportConnector$1$1.run(TransportConnector.java:218)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004dc805a78> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){code}



--