[jira] [Updated] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7106:

Description: 
If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
 The current synchronization breaks this b/c the start code holds a lock for 
its duration, which requires the stopAsync code to block in error.
 This means there is a blocked inactivity monitor thread per blocked starting 
connection...  very quickly too many threads.
 A blocked ssl handshake can demonstrate.
{code:java}
"MQTTInactivityMonitor Async Task: 
java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d waiting 
for monitor entry [0x7fd393776000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
- waiting to lock <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
at 
org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
at 
org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
at 
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004d7803c60> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
{code:java}
"ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0004dc805830> (a sun.nio.ch.Util$3)
- locked <0x0004dc805820> (a java.util.Collections$UnmodifiableSet)
- locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
at 
org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start(MQTTInactivityMonitor.java:148)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.broker.TransportConnection.start(TransportConnection.java:1066)
- locked <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.TransportConnector$1$1.run(TransportConnector.java:218)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004dc805a78> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){code}

  was:
If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
The current synchronization breaks tis b/c the start code holds a lock for its 
duration and the which 

[jira] [Created] (AMQ-7106) TransportConnection pendingStop support during start is broken

2018-11-21 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7106:
---

 Summary: TransportConnection pendingStop support during start is 
broken
 Key: AMQ-7106
 URL: https://issues.apache.org/jira/browse/AMQ-7106
 Project: ActiveMQ
  Issue Type: Bug
  Components: Transport
Affects Versions: 5.15.0
 Environment: mqtt nio ssl
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


If start blocks, the inactivity monitor can kick in. The intent is that it can 
see the starting state and initiate a pending stop and complete.
The current synchronization breaks tis b/c the start code holds a lock for its 
duration and the which requires the stopAsync code to block in error.
This means there is a blocked inactivity monitor thread per blocked starting 
connection.
A blocked ssl handshake can demonstrate.

{code}
"MQTTInactivityMonitor Async Task: 
java.util.concurrent.ThreadPoolExecutor$Worker@7c1745f4[State = -1, empty 
queue]" #4412 daemon prio=5 os_prio=0 tid=0x7fd524f2c800 nid=0x622d waiting 
for monitor entry [0x7fd393776000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1136)
- waiting to lock <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.jmx.ManagedTransportConnection.stopAsync(ManagedTransportConnection.java:66)
at 
org.apache.activemq.broker.TransportConnection.stopAsync(TransportConnection.java:1131)
at 
org.apache.activemq.broker.TransportConnection.serviceTransportException(TransportConnection.java:235)
at 
org.apache.activemq.broker.TransportConnection$1.onException(TransportConnection.java:206)
at 
org.apache.activemq.transport.TransportFilter.onException(TransportFilter.java:101)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onException(MQTTInactivityMonitor.java:196)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor$1$1.run(MQTTInactivityMonitor.java:81)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004d7803c60> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
{code}"ActiveMQ BrokerService[BB] Task-51959" #4414527 daemon prio=5 os_prio=0 
tid=0x7fd5f83b9800 nid=0x1846 runnable [0x7fd3873b3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0004dc805830> (a sun.nio.ch.Util$3)
- locked <0x0004dc805820> (a java.util.Collections$UnmodifiableSet)
- locked <0x0004dc805840> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:380)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:137)
at 
org.apache.activemq.transport.mqtt.MQTTNIOSSLTransport.initializeStreams(MQTTNIOSSLTransport.java:46)
at 
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:519)
at 
org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:160)
at 
org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:412)
at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.start(MQTTTransportFilter.java:157)
at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.start(MQTTInactivityMonitor.java:148)
at 
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
at 
org.apache.activemq.broker.TransportConnection.start(TransportConnection.java:1066)
- locked <0x0004dc8059b0> (a 
org.apache.activemq.broker.jmx.ManagedTransportConnection)
at 
org.apache.activemq.broker.TransportConnector$1$1.run(TransportConnector.java:218)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x0004dc805a78> (a 
java.util.concurrent.ThreadPoolExecutor$Worker){code}



--

[jira] [Resolved] (AMQ-7102) managementContext suppressMBean filters registration but still tracks objects as registered in error

2018-11-16 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7102.
-
Resolution: Fixed

> managementContext suppressMBean filters registration but still tracks objects 
> as registered in error
> 
>
> Key: AMQ-7102
> URL: https://issues.apache.org/jira/browse/AMQ-7102
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: JMX
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> The suppressMbean feature could be better is we don't unnecessarily track the 
> suppressed mbean registrations in out copy on write set.
> Under load, the additional work on add/remove this can be a significant 
> burden.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7102) managementContext suppressMBean filters registration but still tracks objects as registered in error

2018-11-15 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7102:
---

 Summary: managementContext suppressMBean filters registration but 
still tracks objects as registered in error
 Key: AMQ-7102
 URL: https://issues.apache.org/jira/browse/AMQ-7102
 Project: ActiveMQ
  Issue Type: Improvement
  Components: JMX
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


The suppressMbean feature could be better is we don't unnecessarily track the 
suppressed mbean registrations in out copy on write set.
Under load, the additional work on add/remove this can be a significant burden.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7096) ActiveMQ lose messages on broker restarts

2018-11-14 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686573#comment-16686573
 ] 

Gary Tully commented on AMQ-7096:
-

Please provide a standalone test case that asserts that the bug is in activemq. 
There are loads of tests in the activemq-unit-test module that can provide 
inspiration

> ActiveMQ lose messages on broker restarts
> -
>
> Key: AMQ-7096
> URL: https://issues.apache.org/jira/browse/AMQ-7096
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.15.7
>Reporter: Veaceslav Doina
>Priority: Critical
>
> Hello,
> We recently performed ActiveMQ reliability test in order to verify if it can 
> guarantee 0 messages loss. Results show, that if the broker is restarted in 
> the process of producing/consuming we lose messages.
> Testing environment: 
> AWS
> Hardware: 
> |Broker|m5.large|2vCPU / 8GiB|
> |Producer/Consumer|t3.large|2vCPU / 8GiB|
> Software:
> OS: CentOS Linux release 7.5.1804 
> Java: Oracle JDK SE 8 Update 192
> ActiveMQ: ActiveMQ 5.15.7
> Client: [JmsTools|https://github.com/erik-wramner/JmsTools]
> Default ActiveMQ configuration was used with only [unique DLQ 
> configuration|http://activemq.apache.org/message-redelivery-and-dlq-handling.html].
>  
>  
> The result and project with all required data for issue reproducing may be 
> found on GitHub: [https://github.com/veaceslavdoina/messages-brokers-testing]
> Discussion on the Mailing list: 
> [http://activemq.2283324.n4.nabble.com/ActiveMQ-and-Artemis-reliability-Messages-lost-tt4744881.html]
>  
> Thank you!
>  
> Slava.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7009) ActiveMQ stop delivering messages

2018-11-14 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7009.
-
   Resolution: Fixed
 Assignee: Gary Tully
Fix Version/s: 5.16.0

Thanks for the test case and suggested fix. A little tweak to the test case to 
better manage memory usage made it clear that your fix is good.

> ActiveMQ stop delivering messages
> -
>
> Key: AMQ-7009
> URL: https://issues.apache.org/jira/browse/AMQ-7009
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.4
>Reporter: Nezih BEN FREDJ
>Assignee: Gary Tully
>Priority: Critical
> Fix For: 5.16.0
>
> Attachments: MemoryMessageStore.java, 
> MemoryMessageStoreQueueCursorTest.java, activemq.log
>
>
> We have a problem with ActiveMQ 5.15.4, it stop deliver messages to clients.
> This happens completely randomly and we are not able to give a test case that 
> reproduce the problem
> Activemq is configured to use the « memoryPersistenceAdapter »
>  
> When examining memory dumps, we identified incoherent situation in the class 
> MemoryMessageStore that can lead to stop delivering messages. When :
>  - MemoryMessageStore.recoverNextMessages(...) is called
>  - « lastBatchId » is not null
>  - « messageTable » does not contains any entry with « lastBatchId » as key
> no messages are recovered
>  
> We added logs to try to understand how it is possible to have a non null « 
> lastBatchId » and «  messageTable » with no entry having « lastBatchId » as 
> key.
>  
> We noticed that the method setBatch is called with a non null « messageId » 
> parameter, but no message with this id is inserted in « messageTable ». After 
> that, when the method MemoryMessageStore.recoverNextMessages(...) is called, 
> it does nothing and no messages are recoverd. And the system go in a what 
> looks like an endless loop.
>  
> We are testing a temporary solution by resetting « lastBatchId » to null when 
> the incoherent situation is detected.
>  
> Can you help us resolving the problem at the source and not just get around.
> I joined modified source code of the class MemoryMessageStore and an extract 
> of the log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AMQ-6903) Issue With Master and Slave Mode

2018-11-13 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-6903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully closed AMQ-6903.
---
Resolution: Not A Bug

The answer is in the error message from the log, you got to start there before 
opening critical issues!

2018-11-07 11:01:02,115 | ERROR | Failed to start Apache ActiveMQ (broker-1, 
ID:broker-1-43211-1541588461667-1:1) | org.apache.activemq.broker.BrokerService 
| main
org.apache.activemq.ConfigurationException: File system space reported by: 
/efs/activemq-data/kahadb was negative, possibly a huge file system, set a sane 
usage.total to provide some guidance

You even referenced the issue that had the fix.
see bottom of: 
https://cwiki.apache.org/confluence/display/ACTIVEMQ/Producer+Flow+Control

> Issue With Master and Slave Mode
> 
>
> Key: AMQ-6903
> URL: https://issues.apache.org/jira/browse/AMQ-6903
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.2
>Reporter: CHAKRADHAR REDDY
>Priority: Critical
>
> Hi All,
> We have installed Apache AMQ in master slave mode we could see that slave is 
> not becoming up immediately after master is shutdown and we could see below 
> exception.
>  
> FYI: We have many messages in queues and this slave is not up till we clear 
> data in kahadb location and once we have cleaned data in kahadb location 
> there no messages in queues any longer.
> we have lost all messages from queues is there any option we can get messages 
> from db.log etc.. files from kahadb location
> We have received below transport Errors this is reason we have shutdown 
> master to make slave up
> 2018-01-13 04:33:04,032 | ERROR | Could not accept connection from tcp://IP 
> of server containing interfaces accessing AMQ server:52808 : {} | 
> org.apache.activemq.broker.TransportConnector | ActiveMQ BrokerServ
> ice[ip] Task-15549
> java.lang.IllegalStateException: Timer already cancelled.
> at java.util.Timer.sched(Timer.java:397)[:1.8.0_151]
> at java.util.Timer.schedule(Timer.java:193)[:1.8.0_151]
> at 
> org.apache.activemq.transport.AbstractInactivityMonitor.startConnectCheckTask(AbstractInactivityMonitor.java:425)[activemq-client-5.15.2.jar:5.15.2]
> at org.apache.activemq.transport.AbstractInactivityMonitor.startConne
> After shutdown of master slave is not up with below error and we lost all 
> data in queues.
> 2018-02-21 20:09:18,135 | ERROR | File system space reported by: 
> /DBA/EFS/activemq/kahadb was negative, possibly a huge file system, set a 
> sane usage.total to provide some guidance | 
> org.apache.activemq.broker.BrokerService | main
> 2018-02-21 20:09:18,139 | ERROR | Failed to start Apache ActiveMQ (IP, ID:DNS 
> of activemq server) | org.apache.activemq.broker.BrokerService | main
> org.apache.activemq.ConfigurationException: File system space reported by: 
> /DBA/EFS/activemq/kahadb was negative, possibly a huge file system, set a 
> sane usage.total to provide some guidance
> at 
> org.apache.activemq.broker.BrokerService.checkUsageLimit(BrokerService.java:2092)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.broker.BrokerService.checkStoreUsageLimits(BrokerService.java:2029)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.broker.BrokerService.checkStoreSystemUsageLimits(BrokerService.java:2202)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.broker.BrokerService.doStartBroker(BrokerService.java:777)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.broker.BrokerService.startBroker(BrokerService.java:733)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.broker.BrokerService.start(BrokerService.java:636)[activemq-broker-5.15.2.jar:5.15.2]
> at 
> org.apache.activemq.xbean.XBeanBrokerService.afterPropertiesSet(XBeanBrokerService.java:73)[activemq-spring-5.15.2.jar:5.15.2]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)[:1.8.0_151]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)[:1.8.0_151]
> :



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-12 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684081#comment-16684081
 ] 

Gary Tully commented on AMQ-7091:
-

[~tabish121] thanks. [~jgenender] Probably no need for any config option then. 
let it run as is.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-12 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683923#comment-16683923
 ] 

Gary Tully commented on AMQ-7091:
-

[~tabish121] Hi tim, you added the messageReferences cache way back in 
https://github.com/apache/activemq/commit/943db3c3cb12b4c4504b4966135cf9a0cc69f0ba
 - this PR proposed to remove that cache b/c it can grow unlimited with the 
number of pending messages, even it is a just two Longs per message it can be 
significant.
Can you recall the use case that caused that cache to be introduced? I ask b/c 
it is under threat with this change.
It may be that the need for the cache is offset by the changes to what is in 
the index , or it may be that with the change to what is stored, the cache is 
more important. 
My guess is that if it is important you may recall, hense the redirect.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7094) AuditLog should show the target of the operation and not just arguments

2018-11-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679675#comment-16679675
 ] 

Gary Tully commented on AMQ-7094:
-

new log:
{code}
2018-11-07 17:50:42,118 [on(2)-127.0.0.1] - INFO  audit 
 - admin ended org.apache.activemq.broker.jmx.QueueView.purge[] on myTestQueue 
at 07-11-2018 17:50:42,118{code}

> AuditLog should show the target of the operation and not just arguments
> ---
>
> Key: AMQ-7094
> URL: https://issues.apache.org/jira/browse/AMQ-7094
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Broker, JMX
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> When we enable to audit log (http://activemq.apache.org/audit-logging.html), 
> some of the operations don't have any arguments that identify the target (for 
> example, purge(), resetStatistics()). The output of audit log doesn't show 
> the queue name.
> {code}
> 2018-08-31 11:19:43,803 | INFO | (6)-127.0.0.1 | audit | 
> vemq.broker.util.DefaultAuditLog 27 |  anonymous called 
> org.apache.activemq.broker.jmx.QueueView.purge[] at 31-08-2018 11:19:43,801 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7094) AuditLog should show the target of the operation and not just arguments

2018-11-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679675#comment-16679675
 ] 

Gary Tully edited comment on AMQ-7094 at 11/8/18 12:18 PM:
---

new log:
{code}
2018-11-07 17:50:42,118 [on(2)-127.0.0.1] - INFO  audit   - admin called 
org.apache.activemq.broker.jmx.QueueView.purge[] on myTestQueue at 07-11-2018 
17:50:42,118{code}


was (Author: gtully):
new log:
{code}
2018-11-07 17:50:42,118 [on(2)-127.0.0.1] - INFO  audit 
 - admin ended org.apache.activemq.broker.jmx.QueueView.purge[] on myTestQueue 
at 07-11-2018 17:50:42,118{code}

> AuditLog should show the target of the operation and not just arguments
> ---
>
> Key: AMQ-7094
> URL: https://issues.apache.org/jira/browse/AMQ-7094
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Broker, JMX
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> When we enable to audit log (http://activemq.apache.org/audit-logging.html), 
> some of the operations don't have any arguments that identify the target (for 
> example, purge(), resetStatistics()). The output of audit log doesn't show 
> the queue name.
> {code}
> 2018-08-31 11:19:43,803 | INFO | (6)-127.0.0.1 | audit | 
> vemq.broker.util.DefaultAuditLog 27 |  anonymous called 
> org.apache.activemq.broker.jmx.QueueView.purge[] at 31-08-2018 11:19:43,801 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7094) AuditLog should show the target of the operation and not just arguments

2018-11-08 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7094:
---

 Summary: AuditLog should show the target of the operation and not 
just arguments
 Key: AMQ-7094
 URL: https://issues.apache.org/jira/browse/AMQ-7094
 Project: ActiveMQ
  Issue Type: Improvement
  Components: Broker, JMX
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


When we enable to audit log (http://activemq.apache.org/audit-logging.html), 
some of the operations don't have any arguments that identify the target (for 
example, purge(), resetStatistics()). The output of audit log doesn't show the 
queue name.

{code}
2018-08-31 11:19:43,803 | INFO | (6)-127.0.0.1 | audit | 
vemq.broker.util.DefaultAuditLog 27 |  anonymous called 
org.apache.activemq.broker.jmx.QueueView.purge[] at 31-08-2018 11:19:43,801 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7094) AuditLog should show the target of the operation and not just arguments

2018-11-08 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7094.
-
Resolution: Fixed

> AuditLog should show the target of the operation and not just arguments
> ---
>
> Key: AMQ-7094
> URL: https://issues.apache.org/jira/browse/AMQ-7094
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Broker, JMX
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> When we enable to audit log (http://activemq.apache.org/audit-logging.html), 
> some of the operations don't have any arguments that identify the target (for 
> example, purge(), resetStatistics()). The output of audit log doesn't show 
> the queue name.
> {code}
> 2018-08-31 11:19:43,803 | INFO | (6)-127.0.0.1 | audit | 
> vemq.broker.util.DefaultAuditLog 27 |  anonymous called 
> org.apache.activemq.broker.jmx.QueueView.purge[] at 31-08-2018 11:19:43,801 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-07 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678790#comment-16678790
 ] 

Gary Tully commented on AMQ-7091:
-

[~alanprot] you have to ask the computer to see if there is any negative 
performance impact. This reference count in memory is an optimisation 
introduced for a performance issue, probably a typical fanout case with lots of 
subs. See the linked issue. This issue is about that cache being unbounded in 
terms of memory usage, the solution is to have the data in the pageFile and 
pageCache such that it can get flushed from memory at the cost of accessing 
from pages in normal operation.

I would like to see the actual trade off in publish/ack latency quantified. It 
may well be negligible; but we need to respect the original use case to verify.

In addition, if this warrants a version update, the auto migration path needs 
to be validated.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-07 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678090#comment-16678090
 ] 

Gary Tully commented on AMQ-7082:
-

[~alanprot] I added an additional condition the the end of your new test to 
verify that newly freed pages could be reused and reworked to fix.

This ensures that there is no need for further growth while recovery is in 
progress which is better. I think we are finally good.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-07 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reopened AMQ-7082:
-

peeking at the latest additions to the fix.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7082.
-
Resolution: Fixed

Fix is good, thanks [~alanprot]

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676855#comment-16676855
 ] 

Gary Tully commented on AMQ-7091:
-

that info does not need to be persisted, it is duplicate info. I would like to 
understand the tradeoff - that in mem reference map was introduced as on 
optimisation via: https://issues.apache.org/jira/browse/AMQ-3467 in 2011 but is 
in memory only to be fast. It may be possible to squash that map to a 
sequenceSet of offsets that share a reference count which would address the mem 
usage without the need to persist to disk.

 

The implications of additional page writes on every add/ack may be significant 
for the simple fanout (when users keep up) case. Typically the broker has 
plenty of memory for this sort of thing.

 

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676576#comment-16676576
 ] 

Gary Tully commented on AMQ-7082:
-

The window is small, however with something like message expiry, and all 
messages expiring while the broker is down, there can be alot of page file 
churn immediately after start. In addition if the recovery is very long, the 
window gets larger.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reopened AMQ-7082:
-

[~alanprot] pointed out an issue that can cause the concurrent free page 
recovery to walk on newly allocated and freed pages or pages in the process of 
being used.

The recovery processing needs to terminate at the nextFreePage id that exists 
at start. Everything after that is not recoverable, it is in use!

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7085) Queue.start() does not call systemUsage.start() so TempStore usage is not handled correctly

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676375#comment-16676375
 ] 

Gary Tully commented on AMQ-7085:
-

[~sits] even for a small change, it is best to include a test case to validate 
the use case and to protect your change into the future. It also makes it easy 
to integrate b/c the bug becomes trivial to understand with the test case in 
code.

> Queue.start() does not call systemUsage.start() so TempStore usage is not 
> handled correctly
> ---
>
> Key: AMQ-7085
> URL: https://issues.apache.org/jira/browse/AMQ-7085
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.15.6
>Reporter: David Sitsky
>Priority: Major
> Attachments: AMQ-7085.patch
>
>
> I have an application using ActiveMQ and have a situation where a producer is 
> blocked with this log message due to the temp usage limit being hit:
> {noformat}
> [ActiveMQ Transport: tcp:///aaa.bbb.ccc.ddd:65119@64759] 82567721 INFO  
> org.apache.activemq.broker.region.Queue - 
> Usage(Main:temp:queue://aaabbb:temp) percentUsage=99%, usage=61771109932, 
> limit=61771104256, percentUsageMinDelta=1%;Parent:Usage(Main:temp) 
> percentUsage=100%, usage=61771109932, limit=61771104256, 
> percentUsageMinDelta=1%: Temp Store is Full (99% of 61771104256). Stopping 
> producer (ID:a-3:1:1:1) to prevent flooding queue://aaabbb. See 
> http://activemq.apache.org/producer-flow-control.html for more info (blocking 
> for: 8512s){noformat}
> In the past I have been able to use JConsole and update the broker's 
> TempLimit value to a higher value to allow things to continue.
>  
> However on this occasion, the messages above when output again show that the 
> parent's temp limit has updated however the child's limit (the queue) is 
> unchanged.  So it seems the broker's TempUsage does not know about the 
> queue's TempUsage. 
>  
> In looking at the code.. it seems a child Usage class has to call start() in 
> order for this parent -> children link to be established and for parent limit 
> changes to be propagated down to children.  However the Queue start() method 
> doesn't call systemUsage.getTempUsage().start() for some reason (or even just 
> systemUsage.start()).
>  
> Is this a bug?
>  
> DestinationView sadly does not expose setTempLimit() either so this wasn't an 
> option either.
>  
> From Queue:
> {code:java}
> @Override
>     public void start() throws Exception {
>         if (started.compareAndSet(false, true)) {
>             if (memoryUsage != null) {
>                 memoryUsage.start();
>             }
>             if (systemUsage.getStoreUsage() != null) {
>                 systemUsage.getStoreUsage().start();
>             }
>             systemUsage.getMemoryUsage().addUsageListener(this);
>             messages.start();
>             if (getExpireMessagesPeriod() > 0) {
>                 scheduler.executePeriodically(expireMessagesTask, 
> getExpireMessagesPeriod());
>             }
>             doPageIn(false);
>         }
>     }{code}
>    



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7088) mqtt deadlock on creation/removal of destinations (virtual topic)

2018-11-01 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7088.
-
Resolution: Fixed

>  mqtt deadlock on creation/removal of destinations (virtual topic)
> --
>
> Key: AMQ-7088
> URL: https://issues.apache.org/jira/browse/AMQ-7088
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker, MQTT
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> {code}
> "ActiveMQ NIO Worker 7" waiting for ownable synchronizer 0x000405d1fce0, 
> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>  which is held by "ActiveMQ NIO Worker 5"
> "ActiveMQ NIO Worker 5": waiting to lock monitor 0x7fdfd001d3e8 (object 
> 0x000405231968, a 
> org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor),
>  which is held by "ActiveMQ NIO Worker 7"
> {code}
> Here are the stack traces:
> {code}
> "ActiveMQ NIO Worker 7":
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x000405d1fce0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>  at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>  at 
> org.apache.activemq.broker.region.AbstractRegion.addDestination(AbstractRegion.java:132)
>  at 
> org.apache.activemq.broker.region.RegionBroker.addDestination(RegionBroker.java:348)
>  at 
> org.apache.activemq.broker.region.virtual.VirtualTopic.create(VirtualTopic.java:100)
>  at 
> org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor.create(VirtualDestinationInterceptor.java:83)
>  - locked <0x000405231968> (a 
> org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor)
>  at 
> org.apache.activemq.broker.region.CompositeDestinationInterceptor.create(CompositeDestinationInterceptor.java:52)
>  at 
> org.apache.activemq.broker.region.RegionBroker.addConsumer(RegionBroker.java:423)
>  at 
> org.apache.activemq.broker.jmx.ManagedRegionBroker.addConsumer(ManagedRegionBroker.java:243)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at 
> org.apache.activemq.advisory.AdvisoryBroker.addConsumer(AdvisoryBroker.java:130)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at 
> org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:108)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
>  at 
> org.apache.activemq.security.AuthorizationBroker.addConsumer(AuthorizationBroker.java:183)
>  at 
> org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:108)
>  at 
> org.apache.activemq.broker.TransportConnection.processAddConsumer(TransportConnection.java:696)
>  at org.apache.activemq.command.ConsumerInfo.visit(ConsumerInfo.java:352)
>  at 
> org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:326)
>  at 
> org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:190)
>  at 
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:45)
>  at 
> org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onCommand(MQTTInactivityMonitor.java:162)
>  at 
> org.apache.activemq.transport.mqtt.MQTTTransportFilter.sendToActiveMQ(MQTTTransportFilter.java:106)
>  at 
> org.apache.activemq.transport.mqtt.MQTTProtocolConverter.sendToActiveMQ(MQTTProtocolConverter.java:181)
>  at 
> org.apache.activemq.transport.mqtt.strategy.AbstractMQTTSubscriptionStrategy.doSubscribe(AbstractMQTTSubscriptionStrategy.java:210)
>  at 
> org.apache.activemq.transport.mqtt.strategy.MQTTVirtualTopicSubscriptionStrategy.onSubscribe(MQTTVirtualTopicSubscriptionStrategy.java:118)
>  at 
> org.apache.activemq.transport.mqtt.strategy.AbstractMQTTSubscriptionStrategy.onSubscribe(AbstractMQTTSubscriptionStrategy.java:118)
>  at 
> org.apache.activemq.transport.mqtt.MQTTProtocolConverter.onSubscribe(MQTTProtocolConverter.java:362)
>  at 
> 

[jira] [Created] (AMQ-7088) mqtt deadlock on creation/removal of destinations (virtual topic)

2018-11-01 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7088:
---

 Summary:  mqtt deadlock on creation/removal of destinations 
(virtual topic)
 Key: AMQ-7088
 URL: https://issues.apache.org/jira/browse/AMQ-7088
 Project: ActiveMQ
  Issue Type: Bug
  Components: Broker, MQTT
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


{code}
"ActiveMQ NIO Worker 7" waiting for ownable synchronizer 0x000405d1fce0, (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
 which is held by "ActiveMQ NIO Worker 5"

"ActiveMQ NIO Worker 5": waiting to lock monitor 0x7fdfd001d3e8 (object 
0x000405231968, a 
org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor),
 which is held by "ActiveMQ NIO Worker 7"
{code}

Here are the stack traces:

{code}
"ActiveMQ NIO Worker 7":
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x000405d1fce0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
 at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
 at 
org.apache.activemq.broker.region.AbstractRegion.addDestination(AbstractRegion.java:132)
 at 
org.apache.activemq.broker.region.RegionBroker.addDestination(RegionBroker.java:348)
 at 
org.apache.activemq.broker.region.virtual.VirtualTopic.create(VirtualTopic.java:100)
 at 
org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor.create(VirtualDestinationInterceptor.java:83)
 - locked <0x000405231968> (a 
org.apache.activemq.broker.region.virtual.VirtualDestinationInterceptor)
 at 
org.apache.activemq.broker.region.CompositeDestinationInterceptor.create(CompositeDestinationInterceptor.java:52)
 at 
org.apache.activemq.broker.region.RegionBroker.addConsumer(RegionBroker.java:423)
 at 
org.apache.activemq.broker.jmx.ManagedRegionBroker.addConsumer(ManagedRegionBroker.java:243)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at 
org.apache.activemq.advisory.AdvisoryBroker.addConsumer(AdvisoryBroker.java:130)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at 
org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:108)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at org.apache.activemq.broker.BrokerFilter.addConsumer(BrokerFilter.java:103)
 at 
org.apache.activemq.security.AuthorizationBroker.addConsumer(AuthorizationBroker.java:183)
 at 
org.apache.activemq.broker.MutableBrokerFilter.addConsumer(MutableBrokerFilter.java:108)
 at 
org.apache.activemq.broker.TransportConnection.processAddConsumer(TransportConnection.java:696)
 at org.apache.activemq.command.ConsumerInfo.visit(ConsumerInfo.java:352)
 at 
org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:326)
 at 
org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:190)
 at 
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:45)
 at 
org.apache.activemq.transport.mqtt.MQTTInactivityMonitor.onCommand(MQTTInactivityMonitor.java:162)
 at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.sendToActiveMQ(MQTTTransportFilter.java:106)
 at 
org.apache.activemq.transport.mqtt.MQTTProtocolConverter.sendToActiveMQ(MQTTProtocolConverter.java:181)
 at 
org.apache.activemq.transport.mqtt.strategy.AbstractMQTTSubscriptionStrategy.doSubscribe(AbstractMQTTSubscriptionStrategy.java:210)
 at 
org.apache.activemq.transport.mqtt.strategy.MQTTVirtualTopicSubscriptionStrategy.onSubscribe(MQTTVirtualTopicSubscriptionStrategy.java:118)
 at 
org.apache.activemq.transport.mqtt.strategy.AbstractMQTTSubscriptionStrategy.onSubscribe(AbstractMQTTSubscriptionStrategy.java:118)
 at 
org.apache.activemq.transport.mqtt.MQTTProtocolConverter.onSubscribe(MQTTProtocolConverter.java:362)
 at 
org.apache.activemq.transport.mqtt.MQTTProtocolConverter.onMQTTCommand(MQTTProtocolConverter.java:212)
 at 
org.apache.activemq.transport.mqtt.MQTTTransportFilter.onCommand(MQTTTransportFilter.java:94)
 at 
org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
 at 
org.apache.activemq.transport.nio.NIOSSLTransport.doConsume(NIOSSLTransport.java:440)
 at 

[jira] [Resolved] (AMQ-7086) KahaDB - optionally perform expensive gc run on shutdown

2018-10-31 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7086.
-
Resolution: Fixed

new kahaDB boolean attribute: 

cleanupOnStop - default to true as before.

When disabled, the gc/cleanup iteration on broker stop won't happen, which will 
speed up shutdown.

Use case is with really large db or scheduler db where total index traversal is 
not cheap.

> KahaDB - optionally perform expensive gc run on shutdown
> 
>
> Key: AMQ-7086
> URL: https://issues.apache.org/jira/browse/AMQ-7086
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> when looking at the speed of broker.stop with kahadb and the scheduler store. 
> There is a full gc run, which can be expensive as the whole index needs to be 
> traversed.
> Fast stop/restart is important for fast failover. Leaving gc for runtime, 
> where it has an effect on latency in the normal way, rather than 
> availability, is better.
>  
> I am wondering if there is a use case for gc only at shutdown if the 
> cleanupInterval <= 0, indicating that there were no gc at runtime. The 
> alternative is adding another boolean to the config or adding that back in if 
> the need arises.
> I am leaning towards just removing the gc call during shutdown.
>  
> Note: matching the indexCacheSize to the index file size, trading off with 
> memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7086) KahaDB - optionally perform expensive gc run on shutdown

2018-10-31 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7086:

Summary: KahaDB - optionally perform expensive gc run on shutdown  (was: 
KahaDB - don't perform expensive gc run on shutdown)

> KahaDB - optionally perform expensive gc run on shutdown
> 
>
> Key: AMQ-7086
> URL: https://issues.apache.org/jira/browse/AMQ-7086
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> when looking at the speed of broker.stop with kahadb and the scheduler store. 
> There is a full gc run, which can be expensive as the whole index needs to be 
> traversed.
> Fast stop/restart is important for fast failover. Leaving gc for runtime, 
> where it has an effect on latency in the normal way, rather than 
> availability, is better.
>  
> I am wondering if there is a use case for gc only at shutdown if the 
> cleanupInterval <= 0, indicating that there were no gc at runtime. The 
> alternative is adding another boolean to the config or adding that back in if 
> the need arises.
> I am leaning towards just removing the gc call during shutdown.
>  
> Note: matching the indexCacheSize to the index file size, trading off with 
> memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7086) KahaDB - don't perform expensive gc run on shutdown

2018-10-25 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663646#comment-16663646
 ] 

Gary Tully commented on AMQ-7086:
-

The checkpoint=0 use case is from: AMQ-3646

disabling gc on shutdown will need to be configurable I think.

> KahaDB - don't perform expensive gc run on shutdown
> ---
>
> Key: AMQ-7086
> URL: https://issues.apache.org/jira/browse/AMQ-7086
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> when looking at the speed of broker.stop with kahadb and the scheduler store. 
> There is a full gc run, which can be expensive as the whole index needs to be 
> traversed.
> Fast stop/restart is important for fast failover. Leaving gc for runtime, 
> where it has an effect on latency in the normal way, rather than 
> availability, is better.
>  
> I am wondering if there is a use case for gc only at shutdown if the 
> cleanupInterval <= 0, indicating that there were no gc at runtime. The 
> alternative is adding another boolean to the config or adding that back in if 
> the need arises.
> I am leaning towards just removing the gc call during shutdown.
>  
> Note: matching the indexCacheSize to the index file size, trading off with 
> memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7086) KahaDB - don't perform expensive gc run on shutdown

2018-10-25 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7086:

Description: 
when looking at the speed of broker.stop with kahadb and the scheduler store. 
There is a full gc run, which can be expensive as the whole index needs to be 
traversed.

Fast stop/restart is important for fast failover. Leaving gc for runtime, where 
it has an effect on latency in the normal way, rather than availability, is 
better.

 

I am wondering if there is a use case for gc only at shutdown if the 
cleanupInterval <= 0, indicating that there were no gc at runtime. The 
alternative is adding another boolean to the config or adding that back in if 
the need arises.

I am leaning towards just removing the gc call during shutdown.

 

Note: matching the indexCacheSize to the index file size, trading off with 
memory, does help to speed up the index (read) traversal.

  was:
when looking at the speed of broker.stop with kahadb and the scheduler store. 
There is a full gc run, which can be expensive as the whole index needs to be 
traversed.

Fast stop/restart is important for fast failover. Leaving gc for runtime, where 
it has an effect on latency rather than availability is better.

 

I am wondering if there is a use case for gc only at shutdown if the 
cleanupInterval <= 0, indicating that there were no gc at runtime. The 
alternative is adding another boolean to the config or adding that back in if 
the need arises.

I am leaning towards just removing the gc call during shutdown.

 

Note: matching the indexCacheSize to the index file size, trading off with 
memory, does help to speed up the index (read) traversal.


> KahaDB - don't perform expensive gc run on shutdown
> ---
>
> Key: AMQ-7086
> URL: https://issues.apache.org/jira/browse/AMQ-7086
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> when looking at the speed of broker.stop with kahadb and the scheduler store. 
> There is a full gc run, which can be expensive as the whole index needs to be 
> traversed.
> Fast stop/restart is important for fast failover. Leaving gc for runtime, 
> where it has an effect on latency in the normal way, rather than 
> availability, is better.
>  
> I am wondering if there is a use case for gc only at shutdown if the 
> cleanupInterval <= 0, indicating that there were no gc at runtime. The 
> alternative is adding another boolean to the config or adding that back in if 
> the need arises.
> I am leaning towards just removing the gc call during shutdown.
>  
> Note: matching the indexCacheSize to the index file size, trading off with 
> memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7086) KahaDB - don't perform expensive gc run on shutdown

2018-10-25 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7086:

Description: 
when looking at the speed of broker.stop with kahadb and the scheduler store. 
There is a full gc run, which can be expensive as the whole index needs to be 
traversed.

Fast stop/restart is important for fast failover. Leaving gc for runtime, where 
it has an effect on latency rather than availability is better.

 

I am wondering if there is a use case for gc only at shutdown if the 
cleanupInterval <= 0, indicating that there were no gc at runtime. The 
alternative is adding another boolean to the config or adding that back in if 
the need arises.

I am leaning towards just removing the gc call during shutdown.

 

Note: matching the indexCacheSize to the index file size, trading off with 
memory, does help to speed up the index (read) traversal.

  was:
when looking at the speed of broker.stop with kahadb and the scheduler store. 
There is a full gc run, which can be expensive as the whole index needs to be 
traversed.

Fast stop/restart is important for fast failover. Leaving gc for to runtime, 
where it has an effect on latency rather than availability is better.

 

I am wondering if there is a use case for gc only at shutdown if the 
cleanupInterval <= 0, indicating that there were no gc at runtime. The 
alternative is adding another boolean to the config or adding that back in if 
the need arises.

I am leaning towards just removing the gc call during shutdown.

 

Note: matching the indexCacheSize to the index file size, trading off with 
memory, does help to speed up the index (read) traversal.


> KahaDB - don't perform expensive gc run on shutdown
> ---
>
> Key: AMQ-7086
> URL: https://issues.apache.org/jira/browse/AMQ-7086
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> when looking at the speed of broker.stop with kahadb and the scheduler store. 
> There is a full gc run, which can be expensive as the whole index needs to be 
> traversed.
> Fast stop/restart is important for fast failover. Leaving gc for runtime, 
> where it has an effect on latency rather than availability is better.
>  
> I am wondering if there is a use case for gc only at shutdown if the 
> cleanupInterval <= 0, indicating that there were no gc at runtime. The 
> alternative is adding another boolean to the config or adding that back in if 
> the need arises.
> I am leaning towards just removing the gc call during shutdown.
>  
> Note: matching the indexCacheSize to the index file size, trading off with 
> memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7086) KahaDB - don't perform expensive gc run on shutdown

2018-10-25 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7086:
---

 Summary: KahaDB - don't perform expensive gc run on shutdown
 Key: AMQ-7086
 URL: https://issues.apache.org/jira/browse/AMQ-7086
 Project: ActiveMQ
  Issue Type: Bug
  Components: KahaDB
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


when looking at the speed of broker.stop with kahadb and the scheduler store. 
There is a full gc run, which can be expensive as the whole index needs to be 
traversed.

Fast stop/restart is important for fast failover. Leaving gc for to runtime, 
where it has an effect on latency rather than availability is better.

 

I am wondering if there is a use case for gc only at shutdown if the 
cleanupInterval <= 0, indicating that there were no gc at runtime. The 
alternative is adding another boolean to the config or adding that back in if 
the need arises.

I am leaning towards just removing the gc call during shutdown.

 

Note: matching the indexCacheSize to the index file size, trading off with 
memory, does help to speed up the index (read) traversal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-6690) Protect against JMX move/copy operations onto self

2018-10-24 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662345#comment-16662345
 ] 

Gary Tully commented on AMQ-6690:
-

[~nopius] 

that is not good, for some context; there were related problems with duplicate 
detection and concurrency. Some folks were hitting the jmx operation in 
parallel with large queues and the stats were all over the place. When the 
duplicate detection is in place (as it is by default), the audit will catch 
every resend as a duplicate. 

When I looked at the code is was clear that moving to the same destination was 
never considered. Hence my move to clarify the behaviour.

see:  https://issues.apache.org/jira/browse/AMQ-6703 which has a test for move, 
purge move back.

That approach will work fine post 5.15.0

For the journal, there is ack compaction that will allow log files to get 
reclaimed out of sequence. That may also help.

If you really need the ability to moveToSameDest there is some work to do to 
have it work reliably.

 

> Protect against JMX move/copy operations onto self
> --
>
> Key: AMQ-6690
> URL: https://issues.apache.org/jira/browse/AMQ-6690
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: JMX
>Affects Versions: 5.14.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.15.0
>
>
> The move and copy jmx operations are intended to move/copy messages to 
> another destination. However this is not enforce and if the move/copy 
> destination is the same queue the logs fill with duplicate warnings etc and 
> the stats can get out of sync if the operation is repeated or concurrent.
> Best to cut this off at the pass with a return return code indicating nothing 
> was moved/copied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7084) Kahadb pagefile, allocated and unused pages from read only transactions are leaked

2018-10-23 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7084.
-
Resolution: Fixed

> Kahadb pagefile, allocated and unused pages from read only transactions are 
> leaked
> --
>
> Key: AMQ-7084
> URL: https://issues.apache.org/jira/browse/AMQ-7084
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> The work in AMQ-7080 to checkpoint the free list has identified a case where 
> the free list is not updated to reflect the disk state of an allocated free 
> page. This can lead to a free page leak in error. In practice this pattern of 
> usage is not widely used, but it is relevant when the free page list is being 
> tracked and failure can be arbitrary. It is important that an accurate 
> reflection of the current state is being checkpointed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-23 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660787#comment-16660787
 ] 

Gary Tully commented on AMQ-7080:
-

[~alanprot] I opened https://issues.apache.org/jira/browse/AMQ-7084 to track 
and fix the issue with the freePage list being out of sync with the disk.

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
> Attachments: AMQ-7080-freeList-update.diff
>
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7084) Kahadb pagefile, allocated and unused pages from read only transactions are leaked

2018-10-23 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7084:
---

 Summary: Kahadb pagefile, allocated and unused pages from read 
only transactions are leaked
 Key: AMQ-7084
 URL: https://issues.apache.org/jira/browse/AMQ-7084
 Project: ActiveMQ
  Issue Type: Bug
  Components: KahaDB
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


The work in AMQ-7080 to checkpoint the free list has identified a case where 
the free list is not updated to reflect the disk state of an allocated free 
page. This can lead to a free page leak in error. In practice this pattern of 
usage is not widely used, but it is relevant when the free page list is being 
tracked and failure can be arbitrary. It is important that an accurate 
reflection of the current state is being checkpointed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-22 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659158#comment-16659158
 ] 

Gary Tully commented on AMQ-7080:
-

 

[~alanprot] peek at [^AMQ-7080-freeList-update.diff] 

that will sort the update of the freelist with tx allocations that don't write 
any application data. 

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
> Attachments: AMQ-7080-freeList-update.diff
>
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-22 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7080:

Attachment: AMQ-7080-freeList-update.diff

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
> Attachments: AMQ-7080-freeList-update.diff
>
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-10-22 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659033#comment-16659033
 ] 

Gary Tully commented on AMQ-7082:
-

[~cshannon] yep, good catch thanks. it is the lazySet semantic I was after, but 
I can't avoid calling it!

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-22 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658770#comment-16658770
 ] 

Gary Tully commented on AMQ-7080:
-

{quote}When we call tx.allocate the pages are not included in the 
PageFile.freeList but they still marked as free inside (page.getType() == 
Page.PAGE_FREE_TYPE). So, on a clean shutdown db.free is saved and after the 
restart those pages are not in freeList. In a unclean shutdown, all the index 
is read and those pages are added in the freelist. 
{quote}
 

This looks like a bug. On tx.commit any newly allocated page (ie: one moves off 
of the free list) should be written to reflect the state change. On shutdown 
there will be a flush to ensure those writes hit disk, such that any re-read of 
the index to calculate the free list will find those allocated.

 

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656986#comment-16656986
 ] 

Gary Tully commented on AMQ-7082:
-

I did not want to block normal work with sync on what the recovery thread is 
doing. I think it will be constant churn there.

Typically all operations on the pageFile hold the index lock, so it can be 
mostly sync free (the exception being the optional async writer thread).

it may be worth a test though to validate :)

 

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656919#comment-16656919
 ] 

Gary Tully commented on AMQ-7080:
-

with https://issues.apache.org/jira/browse/AMQ-7082 the need for any change 
around ACTIVEMQ_KILL_MAXSECONDS goes away I think.

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Priority: Major
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-10-19 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7082.
-
Resolution: Fixed

The recovery is now back in the start phase, if it completes we are good. 
Otherwise we try again the next time.

 

[~jgoodyear] - this is a take on the parallel approach, I think it makes good 
sense.

[~alanprot] - there is still a good case for checkpointing which will reduce 
the full replay window, but the perf impact will the be key determinant on that 
I think.

It would be good to gauge the impact of the second reader in your case over 
NFS, there may be need to slow down the recovery thread such that it does not 
hog the disk or cpu.

 

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully closed AMQ-7081.
---

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
> Attachments: AMQ7081Test.java
>
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7081.
-
Resolution: Not A Bug

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
> Attachments: AMQ7081Test.java
>
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-10-19 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7082:
---

 Summary: KahaDB index, recover free pages in parallel with start
 Key: AMQ-7082
 URL: https://issues.apache.org/jira/browse/AMQ-7082
 Project: ActiveMQ
  Issue Type: Bug
  Components: KahaDB
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


AMQ-6590 fixes free page loss through recovery. The recover process can be 
timely, which prevents fast failover, doing recovery on shutdown is preferable, 
but it is still not ideal b/c it will hold onto the kahadb lock. It also can 
stall shutdown unexpectedly.

AMQ-7080 is going to tackle checkpointing the free list. This should help avoid 
the need for recovery but it may still be necessary. If the perf hit is 
significant this may need to be optional.

There will still be the need to walk the index to find the free list.

It is possible to run with no free list and grow, and we can do that while we 
recover the free list in parallel, then merge the two at a safe point. This we 
can do at startup.

In cases where the disk is the bottleneck this won't help much, but it will 
help failover and it will help shutdown, with a bit of luck the recovery will 
complete before we stop.

 

Initially I thought this would be too complex, but if we concede some growth 
while we recover, ie: start with an empty free list, it is should be straight 
forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656669#comment-16656669
 ] 

Gary Tully edited comment on AMQ-7081 at 10/19/18 11:17 AM:


see the attached test case, leave it running for 30 seconds and note the TRACE 
logging.

 

https://issues.apache.org/jira/secure/attachment/12944710/AMQ7081Test.java


was (Author: gtully):
see the attached test case, leave it running for 30 seconds and note the TRACE 
logging.

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
> Attachments: AMQ7081Test.java
>
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656669#comment-16656669
 ] 

Gary Tully commented on AMQ-7081:
-

see the attached test case, leave it running for 30 seconds and note the TRACE 
logging.

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
> Attachments: AMQ7081Test.java
>
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7081:

Attachment: AMQ7081Test.java

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
> Attachments: AMQ7081Test.java
>
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656663#comment-16656663
 ] 

Gary Tully commented on AMQ-7081:
-

This looks ok, maxSlowDuration defaults to 30 seconds, if it is slow for that 
long it gets kicked with that config.

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AMQ-7081) After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default

2018-10-19 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reassigned AMQ-7081:
---

Assignee: Gary Tully

> After AMQ-7079 abortSlowAckConsumerStrategy aborts connection by default
> 
>
> Key: AMQ-7081
> URL: https://issues.apache.org/jira/browse/AMQ-7081
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Major
>
> The fix of AMQ-7079 introduced a breaking change bug since the default value 
> of {{maxSlowCount=-1}} was longer enough for {{abortSlowAckConsumerStrategy}} 
> to just configure the slow consumer detection but it started to disconnect 
> the consumer as well.\{{}}
> Setting {{maxSlowDuration="-1"}} doesn't disconnect the consumer though but I 
> don't think we should change the old default behavior.
> Pre AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> worked before in just detecting a slow consumer. consumer was *not* 
> disconnected.
> After AMQ-7079 fix:
> {code:xml}
>  maxSlowCount="-1" />
> {code}
> disconnects the consumer; ActiveMQ logs:
> {code:java}
> 2018-10-19 10:42:33,124 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localhost] Scheduler
> 2018-10-19 10:42:50,250 | WARN  | no matching consumer, ignoring ack null | 
> org.apache.activemq.broker.TransportConnection | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> 2018-10-19 10:42:50,257 | WARN  | Async error occurred: 
> java.lang.IllegalStateException: Cannot remove a consumer that had not been 
> registered: ID:kaka.it.su.se-53364-1539938520009-1:1:1:1 | 
> org.apache.activemq.broker.TransportConnection.Service | ActiveMQ Transport: 
> tcp:///127.0.0.1:53365@61616
> {code}
> Spring Boot logs:
> {code:java}
> 2018-10-19 10:42:00.209  INFO 65846 --- [   main] 
> se.su.it.simlu.esb.App   : Started App in 1.849 seconds (JVM 
> running for 2.386)
> 2018-10-19 10:42:33.129  WARN 65846 --- [0.1:61616@53365] 
> org.apache.activemq.ActiveMQSession  : Closed consumer on Command, 
> ID:kaka.it.su.se-53364-1539938520009-1:1:1:1
> 2018-10-19 10:42:50.247  INFO 65846 --- [enerContainer-1] 
> se.su.it.simlu.esb.Consumer  : Message Received: Enter some text 
> here for the message body...
> 2018-10-19 10:42:50.261  WARN 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Setup of JMS message listener 
> invoker failed for destination 'su.it.linfra.simlu' - trying to recover. 
> Cause: The Consumer is closed
> 2018-10-19 10:42:50.300  INFO 65846 --- [enerContainer-1] 
> o.s.j.l.DefaultMessageListenerContainer  : Successfully refreshed JMS 
> Connection
> {code}
>  
> The order ("Consumer closed" before "Message Received") is weird because I 
> just use a simple Thread.sleep I suspect:
> {code:java}
>   @Transactional
>   @JmsListener(destination = "su.it.linfra.simlu")
>   public void receiveQueue(String text) throws Exception {
> Thread.sleep(5);
> log.info("Message Received: "+text);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656627#comment-16656627
 ] 

Gary Tully commented on AMQ-7080:
-

It is more to verify the read. A partial write will be ok in this case, but any 
corruption or truncation and it may be possible to read the fingerprint and bad 
data. The freelist is fundamental, if that is wrong then the index gets borked 
and the journal needs to be replayed. At the moment, the index is self 
contained, it will now depend on another file, we need to be sure we can trust 
the content of same.

in addition, it need not change the free list format.

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Priority: Major
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-18 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655866#comment-16655866
 ] 

Gary Tully commented on AMQ-7080:
-

the freelist format is not so lenient.

the writes can be already complete if async.

i think you need a contents checksum in case of a partial write. the read needs 
to be verified. the checksum could be computed on the inmem sequencesset to 
keep it fast

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Priority: Major
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7080) Keep track of free pages - Update db.free file during checkpoints

2018-10-18 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655819#comment-16655819
 ] 

Gary Tully commented on AMQ-7080:
-

you may be able to do this without any change to the stored metadata format or 
free list format by repurposing the freePageSize long to be the 
freePageCheckSum.

That long can hold the checksum of the serialised sequence set, if it is set, 
load the freepage data and if the checksum matches all is good. If not, replay

 In that way it may be possible to be backward compatible or auto migratable.

This won't be free, every writeBatch call will take the additional hit of 
serialising both the freeList and the metadata, but it may be worth it for 
large index files.

it is not sufficient to do this when the checkPointLatch is non null b/c it may 
be that there are no pending writes at that time. Or maybe that is a risk worth 
taking.

Always generate the checksum and update the metadata if the checksum is 
different and only write the freeList when a flush/checkpoint has been set.

 

> Keep track of free pages - Update db.free file during checkpoints
> -
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Priority: Major
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7079) Using abortSlowAckConsumerStrategy aborts slow consumer even though it has disconnected

2018-10-18 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7079.
-
   Resolution: Fixed
 Assignee: Gary Tully
Fix Version/s: 5.16.0

> Using abortSlowAckConsumerStrategy aborts slow consumer even though it has 
> disconnected
> ---
>
> Key: AMQ-7079
> URL: https://issues.apache.org/jira/browse/AMQ-7079
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Reporter: Simon Lundstrom
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> When testing AMQ-7077 I noticed that if an consumer gets tagged as 
> slowConsumer and then the consumer disconnects ActiveMQ will abort the slow 
> consumer (even though it already has disconnected).
>   
>  Somewhere in between of 13:09:10 and 13:09:42 the consumer disconnects.
> {code:java}
> 2018-10-18 13:09:10,735 | INFO  | sub: 
> ID:kaka.it.su.se-51120-1539860894594-1:1:1:1 is no longer slow | 
> org.apache.activemq.broker.region.policy.AbortSlowAckConsumerStrategy | 
> ActiveMQ Broker[localhost] Scheduler
> 2018-10-18 13:09:42,836 | INFO  | aborting slow consumer: 
> ID:kaka.it.su.se-51120-1539860894594-1:1:1:1 for 
> destination:queue://su.it.linfra.simlu | 
> org.apache.activemq.broker.region.policy.AbortSlowConsumerStrategy | ActiveMQ 
> Broker[localh
> {code}
> Configuration:
> {code:xml}
> […]
> 
>   
>  maxTimeSinceLastAck="3000" />
>   
> 
> […]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-17 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7067.
-
Resolution: Fixed

I think this is now sorted, the ack compaction will still work and not get in 
the way.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Assignee: Gary Tully
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

2018-10-17 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653873#comment-16653873
 ] 

Gary Tully commented on AMQ-6590:
-

[~cshannon] sure, the recovery penalty needs to be taken at some stage.

The ~2mins is not ideal, AMQ-7055  helps, but also for the really large index 
cases, being able to set the index page size may help. That is currently not 
exposed in kahadb and defaults to 4k, which means loads of small seek/reads of 
the index file. Exposing that may help.

 

> KahaDB index loses track of free pages on unclean shutdown
> --
>
> Key: AMQ-6590
> URL: https://issues.apache.org/jira/browse/AMQ-6590
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.14.3
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Major
> Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean 
> shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to 
> db.free and reads that in on start up to know which pages can be re-used.  On 
> an unclean shutdown this is not written to disk so on start up the index is 
> supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done 
> before the total page count value has been set so when the iterator is 
> created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free 
> pages are lost and no longer tracked.  This of course means new free pages 
> have to be allocated and all of the existing space is now lost which will 
> lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

2018-10-17 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653398#comment-16653398
 ] 

Gary Tully edited comment on AMQ-6590 at 10/17/18 11:40 AM:


[~cshannon] I have moved this fix to the shutdown phase b/c the recovery work 
can be significant on very large stores. The down side is temporary disk space 
usage by the index which is a better trade off IMHO. 

I also added an info log message to indicate that the some recovery is going on.


was (Author: gtully):
[~cshannon] I have move this fix to the shutdown phase b/c the recovery work 
can be significant on very large stores. The down side is temporary disk space 
usage by the index which is a better trade off IMHO. 

I also added an info log message to indicate that the some recovery is going on.

> KahaDB index loses track of free pages on unclean shutdown
> --
>
> Key: AMQ-6590
> URL: https://issues.apache.org/jira/browse/AMQ-6590
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.14.3
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Major
> Fix For: 5.15.0, 5.14.4
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean 
> shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to 
> db.free and reads that in on start up to know which pages can be re-used.  On 
> an unclean shutdown this is not written to disk so on start up the index is 
> supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done 
> before the total page count value has been set so when the iterator is 
> created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free 
> pages are lost and no longer tracked.  This of course means new free pages 
> have to be allocated and all of the existing space is now lost which will 
> lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

2018-10-17 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653398#comment-16653398
 ] 

Gary Tully commented on AMQ-6590:
-

[~cshannon] I have move this fix to the shutdown phase b/c the recovery work 
can be significant on very large stores. The down side is temporary disk space 
usage by the index which is a better trade off IMHO. 

I also added an info log message to indicate that the some recovery is going on.

> KahaDB index loses track of free pages on unclean shutdown
> --
>
> Key: AMQ-6590
> URL: https://issues.apache.org/jira/browse/AMQ-6590
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.14.3
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Major
> Fix For: 5.15.0, 5.14.4
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean 
> shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to 
> db.free and reads that in on start up to know which pages can be re-used.  On 
> an unclean shutdown this is not written to disk so on start up the index is 
> supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done 
> before the total page count value has been set so when the iterator is 
> created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free 
> pages are lost and no longer tracked.  This of course means new free pages 
> have to be allocated and all of the existing space is now lost which will 
> lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

2018-10-17 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653357#comment-16653357
 ] 

Gary Tully commented on AMQ-6590:
-

The checkpoint won't help, it will not be possible to work with a stale free 
list b/c one of those may now be occupied, hense the index file will have to 
grow, however we can still do the full and necessary free recovery on shutdown 
rather than on start. 

> KahaDB index loses track of free pages on unclean shutdown
> --
>
> Key: AMQ-6590
> URL: https://issues.apache.org/jira/browse/AMQ-6590
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.14.3
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Major
> Fix For: 5.15.0, 5.14.4
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean 
> shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to 
> db.free and reads that in on start up to know which pages can be re-used.  On 
> an unclean shutdown this is not written to disk so on start up the index is 
> supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done 
> before the total page count value has been set so when the iterator is 
> created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free 
> pages are lost and no longer tracked.  This of course means new free pages 
> have to be allocated and all of the existing space is now lost which will 
> lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown

2018-10-17 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653349#comment-16653349
 ] 

Gary Tully commented on AMQ-6590:
-

It turns out that this change, while good, means that we take a hit on start in 
the normal failover case where the primary dies uncleanly.

There have been reports of more than 2mins to start and the threads are stuck 
in sequence.set add. AMQ-7055 helps a good bit, but the problem is we are 
trading off availability for disk usage and taking the hit during restart.

I am thinking it may be better to do a checkpoint of the feeList when we do gc, 
the cleanup phase, and accept that information on restart. 

If the restart is unclean, we remember that and do a freeList recovery when we 
next do an orderly shutdown. In that way, we can still restart fast, lose some 
disk space to some missed free pages and gracefully recover when we are 
stopping.

[~cshannon] I wonder if that will hold together? The other approach is do have 
the option to do this offline, offline work has always been on the todo.

> KahaDB index loses track of free pages on unclean shutdown
> --
>
> Key: AMQ-6590
> URL: https://issues.apache.org/jira/browse/AMQ-6590
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.14.3
>Reporter: Christopher L. Shannon
>Assignee: Christopher L. Shannon
>Priority: Major
> Fix For: 5.15.0, 5.14.4
>
>
> I have discovered an issue with the KahaDB index recovery after an unclean 
> shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. 
> Normally on clean shutdown the index stores the known set of free pages to 
> db.free and reads that in on start up to know which pages can be re-used.  On 
> an unclean shutdown this is not written to disk so on start up the index is 
> supposed to scan the page file to figure out all of the free pages.
> Unfortunately it turns out that this scan of the page file is being done 
> before the total page count value has been set so when the iterator is 
> created it always thinks there are 0 pages to scan.
> The end result is that every time an unclean shutdown occurs all known free 
> pages are lost and no longer tracked.  This of course means new free pages 
> have to be allocated and all of the existing space is now lost which will 
> lead to excessive index file growth over time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-6421) AbortSlowAckConsumerStrategy does not produce SlowConsumer advisory message

2018-10-17 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-6421.
-
   Resolution: Fixed
 Assignee: Gary Tully
Fix Version/s: 5.16.0

> AbortSlowAckConsumerStrategy does not produce SlowConsumer advisory message
> ---
>
> Key: AMQ-6421
> URL: https://issues.apache.org/jira/browse/AMQ-6421
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 5.14.0
>Reporter: Matt Pavlovich
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> The AbortSlowAckConsumerStrategy should throw an advisory when applicable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7077) Queue subscriber view slowConsumer flag set in error

2018-10-17 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7077.
-
Resolution: Fixed

[~simmel] with the abort slow ack policy and 

_maxTimeSinceLastAck_ you can now control when the advisory and slow flag is 
set.

> Queue subscriber view slowConsumer flag set in error
> 
>
> Key: AMQ-7077
> URL: https://issues.apache.org/jira/browse/AMQ-7077
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> For queue subscribers, the broker dispatches prefetch num messages and stops. 
> The subscription is considered full.
> There is currently logic in PrefetchSubscription that flags the consumer as 
> slow on the success full dispatch of the last message in a prefetch batch. 
> this flag remains set till there are more messages to dispatch, which happens 
> on the first ack.
> The consumer is considered slow, till it acks (which seems wrong) and in the 
> case that there are no more message to dispatch, it remains slow.
>  
> Interestingly, there are some tests for this functionality that only validate 
> topics in error.
>  
> From my investigation, it seems that for queue consumers, it is really only 
> possible to gauge slowness due to the frequency of acks, which is what the 
> AbortAckSlowConsumerPolicy does.
> It makes sense that that code flags a consumer as slow once it is detected as 
> such.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7077) Queue subscriber view slowConsumer flag set in error

2018-10-16 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651788#comment-16651788
 ] 

Gary Tully commented on AMQ-7077:
-

If the abortSlowAck policy is responsible for setting the flag, it can also 
fire the advisory. https://issues.apache.org/jira/browse/AMQ-6421

> Queue subscriber view slowConsumer flag set in error
> 
>
> Key: AMQ-7077
> URL: https://issues.apache.org/jira/browse/AMQ-7077
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> For queue subscribers, the broker dispatches prefetch num messages and stops. 
> The subscription is considered full.
> There is currently logic in PrefetchSubscription that flags the consumer as 
> slow on the success full dispatch of the last message in a prefetch batch. 
> this flag remains set till there are more messages to dispatch, which happens 
> on the first ack.
> The consumer is considered slow, till it acks (which seems wrong) and in the 
> case that there are no more message to dispatch, it remains slow.
>  
> Interestingly, there are some tests for this functionality that only validate 
> topics in error.
>  
> From my investigation, it seems that for queue consumers, it is really only 
> possible to gauge slowness due to the frequency of acks, which is what the 
> AbortAckSlowConsumerPolicy does.
> It makes sense that that code flags a consumer as slow once it is detected as 
> such.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7077) Queue subscriber view slowConsumer flag set in error

2018-10-16 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7077:
---

 Summary: Queue subscriber view slowConsumer flag set in error
 Key: AMQ-7077
 URL: https://issues.apache.org/jira/browse/AMQ-7077
 Project: ActiveMQ
  Issue Type: Bug
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


For queue subscribers, the broker dispatches prefetch num messages and stops. 
The subscription is considered full.

There is currently logic in PrefetchSubscription that flags the consumer as 
slow on the success full dispatch of the last message in a prefetch batch. this 
flag remains set till there are more messages to dispatch, which happens on the 
first ack.

The consumer is considered slow, till it acks (which seems wrong) and in the 
case that there are no more message to dispatch, it remains slow.

 

Interestingly, there are some tests for this functionality that only validate 
topics in error.

 

>From my investigation, it seems that for queue consumers, it is really only 
>possible to gauge slowness due to the frequency of acks, which is what the 
>AbortAckSlowConsumerPolicy does.

It makes sense that that code flags a consumer as slow once it is detected as 
such.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-12 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647928#comment-16647928
 ] 

Gary Tully commented on AMQ-7067:
-

I pushed an update, that goes some of the way, at least there is one test and a 
fix.
The test is just for a prepare record, ack compaction will still move once the 
there are outcomes and that will break the reference. that will need a commit 
test in the same vein.
[~jgoodyear] flip: 
[https://github.com/apache/activemq/commit/7c890d477663d91aef518e30d60cf3c13827877a#diff-e3b8fff8c2133dfd70999705bbb558b3R2094]
 to false to see the new test fail to get suck in
I think the solution is to forward commit records, but more tests are needed to 
verify the problem and the fix

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Assignee: Gary Tully
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7070) Priority is not being respected when the cursor cache flips

2018-10-12 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646958#comment-16646958
 ] 

Gary Tully edited comment on AMQ-7070 at 10/12/18 8:18 AM:
---

my bet is that there is little crossover between tests for priority and the 
cache flip tests.

check for tests related to setbatch commits and see to add variants that use 
setPrioritizedMessages=true

 

[~alanprot]  The point being that setPrioritizedMessages is disabled by 
default, so most of the test do not validate that code path.

One approach to sanity check would be to flip the default for 
setPrioritizedMessages to true in PolicyEntry and see if anything gives.


was (Author: gtully):
my bet is that there is little crossover between tests for priority and the 
cache flip tests.

check for tests related to setbatch commits and see to add variants that use 
setPrioritizedMessages=true

> Priority is not being respected when the cursor cache flips
> ---
>
> Key: AMQ-7070
> URL: https://issues.apache.org/jira/browse/AMQ-7070
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Broker
>Reporter: Alan Protasio
>Priority: Minor
> Attachments: AMQ7070Test.java
>
>
> Messages are being dispatched with wrong priority when the cache is flipped.
> See: 
> https://github.com/apache/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/cursors/AbstractStoreCursor.java#L258
> All messages that could get cached in the cursor are dispatched before even 
> though massages with higher priority is in the cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7070) Priority is not being respected when the cursor cache flips

2018-10-11 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646958#comment-16646958
 ] 

Gary Tully commented on AMQ-7070:
-

my bet is that there is little crossover between tests for priority and the 
cache flip tests.

check for tests related to setbatch commits and see to add variants that use 
setPrioritizedMessages=true

> Priority is not being respected when the cursor cache flips
> ---
>
> Key: AMQ-7070
> URL: https://issues.apache.org/jira/browse/AMQ-7070
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Broker
>Reporter: Alan Protasio
>Priority: Minor
> Attachments: AMQ7070Test.java
>
>
> Messages are being dispatched with wrong priority when the cache is flipped.
> See: 
> https://github.com/apache/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/cursors/AbstractStoreCursor.java#L258
> All messages that could get cached in the cursor are dispatched before even 
> though massages with higher priority is in the cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7070) Priority is not being respected when the cursor cache flips

2018-10-11 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646627#comment-16646627
 ] 

Gary Tully edited comment on AMQ-7070 at 10/11/18 4:56 PM:
---

Hi Gary! Thanks again! :D
{quote}the tricky bit is knowing where in the store to start reading from 
again. The existing logic around setBatch works in the knowledge that the 
cursor add order is the same as the store sequence id. Ie: it is a linear 
representation of the store order with the caveat that some writes are pending.

Priority skewes that, but care needs to be taken to ensure the setBatch call is 
inclusive, but not overly so.
{quote}
So, we already have a problem calling setBatch(candidate) with wrong message as 
the order can be different from the order in the store when messages has 
priority? Maybe I din't understand this correctly... 

   -  gtully: the problem is that there are low priority message in the cache, 
but they are in the right order of w.r.t the queue as the broker sees it. 

I thought that if I can call setBatch with messages that are stored after the 
"last" (candidate message), it would be safe to call with last (which is the 
last message dispatched)... I'm not getting any warning about duplicated 
messages.

  - gtully: you won't get duplicates, but you may not get all of your purged 
messages back again.


was (Author: alanprot):
Hi Gary! Thanks again! :D
{quote}the tricky bit is knowing where in the store to start reading from 
again. The existing logic around setBatch works in the knowledge that the 
cursor add order is the same as the store sequence id. Ie: it is a linear 
representation of the store order with the caveat that some writes are pending.

Priority skewes that, but care needs to be taken to ensure the setBatch call is 
inclusive, but not overly so.
{quote}
So, we already have a problem calling setBatch(candidate) with wrong message as 
the order can be different from the order in the store when messages has 
priority? Maybe I din't understand this correctly... 

   -  the problem is that there are low priority message in the cache, but they 
are in the right order of w.r.t the queue as the broker sees it. 

I thought that if I can call setBatch with messages that are stored after the 
"last" (candidate message), it would be safe to call with last (which is the 
last message dispatched)... I'm not getting any warning about duplicated 
messages.

  - you won't get duplicates, but you may not get all of your purged messages 
back again.

> Priority is not being respected when the cursor cache flips
> ---
>
> Key: AMQ-7070
> URL: https://issues.apache.org/jira/browse/AMQ-7070
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Broker
>Reporter: Alan Protasio
>Priority: Minor
> Attachments: AMQ7070Test.java
>
>
> Messages are being dispatched with wrong priority when the cache is flipped.
> See: 
> https://github.com/apache/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/cursors/AbstractStoreCursor.java#L258
> All messages that could get cached in the cursor are dispatched before even 
> though massages with higher priority is in the cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7070) Priority is not being respected when the cursor cache flips

2018-10-11 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646627#comment-16646627
 ] 

Gary Tully edited comment on AMQ-7070 at 10/11/18 4:55 PM:
---

Hi Gary! Thanks again! :D
{quote}the tricky bit is knowing where in the store to start reading from 
again. The existing logic around setBatch works in the knowledge that the 
cursor add order is the same as the store sequence id. Ie: it is a linear 
representation of the store order with the caveat that some writes are pending.

Priority skewes that, but care needs to be taken to ensure the setBatch call is 
inclusive, but not overly so.
{quote}
So, we already have a problem calling setBatch(candidate) with wrong message as 
the order can be different from the order in the store when messages has 
priority? Maybe I din't understand this correctly... 

   -  the problem is that there are low priority message in the cache, but they 
are in the right order of w.r.t the queue as the broker sees it. 

I thought that if I can call setBatch with messages that are stored after the 
"last" (candidate message), it would be safe to call with last (which is the 
last message dispatched)... I'm not getting any warning about duplicated 
messages.

  - you won't get duplicates, but you may not get all of your purged messages 
back again.


was (Author: alanprot):
Hi Gary! Thanks again! :D
{quote}the tricky bit is knowing where in the store to start reading from 
again. The existing logic around setBatch works in the knowledge that the 
cursor add order is the same as the store sequence id. Ie: it is a linear 
representation of the store order with the caveat that some writes are pending.

Priority skewes that, but care needs to be taken to ensure the setBatch call is 
inclusive, but not overly so.
{quote}
So, we already have a problem calling setBatch(candidate) with wrong message as 
the order can be different from the order in the store when messages has 
priority? Maybe I din't understand this correctly... 

I thought that if I can call setBatch with messages that are stored after the 
"last" (candidate message), it would be safe to call with last (which is the 
last message dispatched)... I'm not getting any warning about duplicated 
messages.

 

> Priority is not being respected when the cursor cache flips
> ---
>
> Key: AMQ-7070
> URL: https://issues.apache.org/jira/browse/AMQ-7070
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Broker
>Reporter: Alan Protasio
>Priority: Minor
> Attachments: AMQ7070Test.java
>
>
> Messages are being dispatched with wrong priority when the cache is flipped.
> See: 
> https://github.com/apache/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/cursors/AbstractStoreCursor.java#L258
> All messages that could get cached in the cursor are dispatched before even 
> though massages with higher priority is in the cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2103) VirtualTopic doesn't work correctly with multiple consumers

2018-10-11 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646638#comment-16646638
 ] 

Gary Tully commented on ARTEMIS-2103:
-

[~pawelj] thanks for the confirmation :) - it is great to get feedback.

> VirtualTopic doesn't work correctly with multiple consumers
> ---
>
> Key: ARTEMIS-2103
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2103
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, OpenWire
>Affects Versions: 2.6.3
>Reporter: Pawel
>Assignee: Gary Tully
>Priority: Major
> Fix For: 2.7.0
>
>
> It's impossibe to subscribe multiple virtual topics with the same consumer 
> name.
> I've configured acceptor like described in documentation:
> {code:java}
> tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;virtualTopicConsumerWildcards=Consumer.*.%3E%3B2{code}
> When I'm connecting first consumer:  
> *Consumer.cons1.VirtualTopic.t1*
> proper binding is created.
> Next I'm trying to connect second consumer with the same name:
> *Consumer.cons1.VirtualTopic.t2*
> But no binding is created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7070) Priority is not being respected when the cursor cache flips

2018-10-11 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646215#comment-16646215
 ] 

Gary Tully commented on AMQ-7070:
-

[~alanprot] purging the cache seems the only way, it could be selective, in 
that the highest priority messages could remain but that is an optimisation.

the tricky bit is knowing where in the store to start reading from again. The 
existing logic around setBatch works in the knowledge that the cursor add order 
is the same as the store sequence id. Ie: it is a linear representation of the 
store order with the caveat that some writes are pending.

Priority skewes that, but care needs to be taken to ensure the setBatch call is 
inclusive, but not overly so.

that setBatch call needs care, it may not be sufficient b/c it never had to 
deal with a purge before.

 

> Priority is not being respected when the cursor cache flips
> ---
>
> Key: AMQ-7070
> URL: https://issues.apache.org/jira/browse/AMQ-7070
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Broker
>Reporter: Alan Protasio
>Priority: Minor
> Attachments: AMQ7070Test.java
>
>
> Messages are being dispatched with wrong priority when the cache is flipped.
> See: 
> https://github.com/apache/activemq/blob/master/activemq-broker/src/main/java/org/apache/activemq/broker/region/cursors/AbstractStoreCursor.java#L258
> All messages that could get cached in the cursor are dispatched before even 
> though massages with higher priority is in the cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643245#comment-16643245
 ] 

Gary Tully commented on AMQ-7067:
-

[~cshannon] We just need the lock for protection, no tx, b/c the index is not 
updated.

On the compaction, I think that is the simplest approach.

Ideally we would have some sort of reference count on the referenced ids in the 
ackMessageFileMap - allowing each ack and outcome to be recorded and having 
some way to know there are just non ack references remaining.

We don't need to retain everything. i.e.: the prepare record can go if there is 
an outcome, and if the outcome is rollback then both the prepare and rollback 
can go b/c recovery will default to rollback. For commit, both xa and non xa, 
it needs to be retained as long as the updates (add/ack commands).

It will get complex fast however.

Getting it correct first will suffice, which means retaining the transaction 
related commands. I will see if I can wrangle a test to reproduce to get the 
full context.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Assignee: Gary Tully
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reassigned AMQ-7067:
---

Assignee: Gary Tully

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Assignee: Gary Tully
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643158#comment-16643158
 ] 

Gary Tully edited comment on AMQ-7067 at 10/9/18 11:40 AM:
---

[~cshannon] yes, good catch. I should be under that lock. The map does not get 
updated in the pagefile tx, just during checkpoints, but it does need to be 
protected from concurrent access.


was (Author: gtully):
[~cshannon] yes, good catch. I should be under that lock for all of the above 
reasons.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643158#comment-16643158
 ] 

Gary Tully commented on AMQ-7067:
-

[~cshannon] yes, good catch. I should be under that lock for all of the above 
reasons.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7067:

Description: 
KahaDB Recovery can experience a dangling transaction when prepare and commit 
occur on different data files.

Scenario:

A XA Transaction is started, message is prepared and sent into Broker.

We then send into broker enough messages to file page file (100 message with 
512 * 1024 characters in message payload). This forces a new data file to be 
created.

Commit the XA transaction. Commit will land on the new data file.

Restart the Broker.

Upon restart a KahaDB recovery is executed.

The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
will appear in recovered message state.

Looking deeper into this scenario, it appears that the commit message is GC'd, 
hence the prepare & commit can not be matched.

The MessageDatabase only checks the following for GC:

{color:#808080}// Don't GC files referenced by in-progress 
tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
!= {color:#cc7832}null{color}) {
 {color:#cc7832}for {color}({color:#cc7832}int 
{color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
 {color}pendingTx <= 
inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
{color}pendingTx++) {
 gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
 }

We need to become aware of where the prepare & commits occur in pagefiles with 
respect to GCing files.

  was:
KahaDB Recovery can experience a dangling transaction when prepare and commit 
occur on different pagefiles.

Scenario:

A XA Transaction is started, message is prepared and sent into Broker.

We then send into broker enough messages to file page file (100 message with 
512 * 1024 characters in message payload). This forces a new pagefile to be 
created.

Commit the XA transaction. Commit will land on the new page file.

Restart the Broker.

Upon restart a KahaDB recovery is executed.

The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
will appear in recovered message state.

Looking deeper into this scenario, it appears that the commit message is GC'd, 
hence the prepare & commit can not be matched. 

The MessageDatabase only checks the following for GC:

{color:#808080}// Don't GC files referenced by in-progress tx
{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] != 
{color:#cc7832}null{color}) {
 {color:#cc7832}for {color}({color:#cc7832}int 
{color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
 {color}pendingTx <= 
inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
{color}pendingTx++) {
 gcCandidateSet.remove(pendingTx){color:#cc7832};
{color} }
}

We need to become aware of where the prepare & commits occur in pagefiles with 
respect to GCing files.


> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new data file to be 
> created.
> Commit the XA transaction. Commit will land on the new data file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched.
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress 
> tx{color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};{color} }
>  }
> We need to become aware of where the prepare & commits 

[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-09 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643123#comment-16643123
 ] 

Gary Tully commented on AMQ-7067:
-

[~jgoodyear] I have pushed your changes and some additional test tidy up and a 
prepare variant. Thank you.

 

[https://github.com/apache/activemq/commit/57c7939534a927bfc2d1b0454aac7ef8d804532b]

 

I think ack compaction will need some follow on work b/c it won't be aware that 
the ackMessageFileMap now also has transaction locations, it will only move 
acks and I think will leave the journals candidates for gc again. 

As it stands ackCompaction should be disabled for this fix to be effective, 
until that is proven not to be the case or it is fixed.

 

There are some ackCompaction tests that can be combined with the recovery 
checks to validate. 

This issue should remain open till we get a resolution to that.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642151#comment-16642151
 ] 

Gary Tully edited comment on AMQ-7067 at 10/8/18 5:07 PM:
--

I see that losing a prepare record after an outcome is still recoverable, once 
all of the updates are present.

However I think that a prepare record needs to be tracked in case it falls 
outside of the current txRange or current write file.


was (Author: gtully):
I see that losing a prepare record after an outcome is still recoverable, once 
all of the updates are present.

However I think that a prepare record needs to be tracked in case if falls 
outside of the current txRange or current write file.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642151#comment-16642151
 ] 

Gary Tully commented on AMQ-7067:
-

I see that losing a prepare record after an outcome is still recoverable, once 
all of the updates are present.

However I think that a prepare record needs to be tracked in case if falls 
outside of the current txRange or current write file.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642107#comment-16642107
 ] 

Gary Tully edited comment on AMQ-7067 at 10/8/18 4:37 PM:
--

In the xa case, I don't think the prepare record location is tracked, so it can 
get lost, leaving updates and a commit/rollback - which will barf. I am 
suggesting tracking the outcome record with the prepare location, and possibly 
with all of the updates.

I had not considered the non xa case, in that case, the commit is all that is 
needed b/c the default will be to rollback. The commit outcome location needs 
to be linked to each of the update commands in turn.


was (Author: gtully):
In the xa case, I don't think the prepare record location is tracked, so it can 
get lost, leaving messages and a commit/rollback - which will barf. I am 
suggesting tracking the outcome record with the prepare location, and possibly 
with all of the updates.

I had not considered the non xa case, in that case, the commit is all that is 
needed b/c the default will be to rollback. The commit outcome location needs 
to be linked to each of the update commands in turn.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642107#comment-16642107
 ] 

Gary Tully edited comment on AMQ-7067 at 10/8/18 4:36 PM:
--

In the xa case, I don't think the prepare record location is tracked, so it can 
get lost, leaving messages and a commit/rollback - which will barf. I am 
suggesting tracking the outcome record with the prepare location, and possibly 
with all of the updates.

I had not considered the non xa case, in that case, the commit is all that is 
needed b/c the default will be to rollback. The commit outcome location needs 
to be linked to each of the update commands in turn.


was (Author: gtully):
In the xa case, I don't think the prepare record location is tracked, so it can 
get lost, leaving messages and a commit/rollback - which will barf. I am 
suggesting tracking the outcome record with the prepare location, and possibly 
with all of the updates.

I had not considered the non xa case, in that case, the commit is all that is 
needed b/c the default will be to rollback. The commit outcome location needs 
to be linked to each of the add commands in turn.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642107#comment-16642107
 ] 

Gary Tully commented on AMQ-7067:
-

In the xa case, I don't think the prepare record location is tracked, so it can 
get lost, leaving messages and a commit/rollback - which will barf. I am 
suggesting tracking the outcome record with the prepare location, and possibly 
with all of the updates.

I had not considered the non xa case, in that case, the commit is all that is 
needed b/c the default will be to rollback. The commit outcome location needs 
to be linked to each of the add commands in turn.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different data files.

2018-10-08 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully updated AMQ-7067:

Summary: KahaDB Recovery can experience a dangling transaction when prepare 
and commit occur on different data files.  (was: KahaDB Recovery can experience 
a dangling transaction when prepare and commit occur on different pagefiles.)

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different data files.
> 
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7067) KahaDB Recovery can experience a dangling transaction when prepare and commit occur on different pagefiles.

2018-10-08 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641757#comment-16641757
 ] 

Gary Tully commented on AMQ-7067:
-

[~jgoodyear] I think the patch is on the right track but I think it needs some 
small mod.

 

The ack message file map can do what we need but I think it has to track the 
association between the prepare record location and the outcome 
(commit/rollback) location rather than the actual message or ack commands.

ie: all of the transaction commands could be in one data file, the prepare in 
another and the outcome in a third. Tracking the link between the third and the 
first is not sufficient in this case. It should be possible to validate that 
with another test scenario.

I think the prepare record location needs to be tracked in the in memory tx 
representation in some way to make it available to the outcome processor.

> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> ---
>
> Key: AMQ-7067
> URL: https://issues.apache.org/jira/browse/AMQ-7067
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB, XA
>Affects Versions: 5.15.6
>Reporter: Jamie goodyear
>Priority: Critical
> Fix For: 5.16.0, 5.15.7
>
> Attachments: amq7067test.patch
>
>
> KahaDB Recovery can experience a dangling transaction when prepare and commit 
> occur on different pagefiles.
> Scenario:
> A XA Transaction is started, message is prepared and sent into Broker.
> We then send into broker enough messages to file page file (100 message with 
> 512 * 1024 characters in message payload). This forces a new pagefile to be 
> created.
> Commit the XA transaction. Commit will land on the new page file.
> Restart the Broker.
> Upon restart a KahaDB recovery is executed.
> The prepare in PageFile 1 is not matched to Commit on PageFile 2, as such, it 
> will appear in recovered message state.
> Looking deeper into this scenario, it appears that the commit message is 
> GC'd, hence the prepare & commit can not be matched. 
> The MessageDatabase only checks the following for GC:
> {color:#808080}// Don't GC files referenced by in-progress tx
> {color}{color:#cc7832}if {color}(inProgressTxRange[{color:#6897bb}0{color}] 
> != {color:#cc7832}null{color}) {
>  {color:#cc7832}for {color}({color:#cc7832}int 
> {color}pendingTx=inProgressTxRange[{color:#6897bb}0{color}].getDataFileId(){color:#cc7832};
>  {color}pendingTx <= 
> inProgressTxRange[{color:#6897bb}1{color}].getDataFileId(){color:#cc7832}; 
> {color}pendingTx++) {
>  gcCandidateSet.remove(pendingTx){color:#cc7832};
> {color} }
> }
> We need to become aware of where the prepare & commits occur in pagefiles 
> with respect to GCing files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARTEMIS-2110) JDBC LeaseLocker repeated renew or renew after aquire can fail in error

2018-10-04 Thread Gary Tully (JIRA)
Gary Tully created ARTEMIS-2110:
---

 Summary: JDBC LeaseLocker repeated renew or renew after aquire can 
fail in error
 Key: ARTEMIS-2110
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2110
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.6.3
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 2.7.0


There is an intermittent failure in:

org.apache.activemq.artemis.core.server.impl.jdbc.JdbcLeaseLockTest#shouldRenewAcquiredLock

(seems to block pr ci, and reproduced for me 20% of the time)

if the db currentTime has not moved on, using the same lease time results in a 
failure. A renew immediately after acquiring the lock is a sensible pattern to 
verify the aquire. 

 

The lock renew needs >= in place or > on the WHERE clause to allow an 
'identity' lease to succeed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2103) VirtualTopic doesn't work correctly with multiple consumers

2018-10-03 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636893#comment-16636893
 ] 

Gary Tully commented on ARTEMIS-2103:
-

The mapping to queue name needs to take the full openwire consumer queue name 
into account rather than parsing out the consumer identifier component.

> VirtualTopic doesn't work correctly with multiple consumers
> ---
>
> Key: ARTEMIS-2103
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2103
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, OpenWire
>Affects Versions: 2.6.3
>Reporter: Pawel
>Assignee: Gary Tully
>Priority: Major
>
> It's impossibe to subscribe multiple virtual topics with the same consumer 
> name.
> I've configured acceptor like described in documentation:
> {code:java}
> tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;virtualTopicConsumerWildcards=Consumer.*.%3E%3B2{code}
> When I'm connecting first consumer:  
> *Consumer.cons1.VirtualTopic.t1*
> proper binding is created.
> Next I'm trying to connect second consumer with the same name:
> *Consumer.cons1.VirtualTopic.t2*
> But no binding is created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARTEMIS-2103) VirtualTopic doesn't work correctly with multiple consumers

2018-10-03 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reassigned ARTEMIS-2103:
---

Assignee: Gary Tully

> VirtualTopic doesn't work correctly with multiple consumers
> ---
>
> Key: ARTEMIS-2103
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2103
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, OpenWire
>Affects Versions: 2.6.3
>Reporter: Pawel
>Assignee: Gary Tully
>Priority: Major
>
> It's impossibe to subscribe multiple virtual topics with the same consumer 
> name.
> I've configured acceptor like described in documentation:
> {code:java}
> tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;virtualTopicConsumerWildcards=Consumer.*.%3E%3B2{code}
> When I'm connecting first consumer:  
> *Consumer.cons1.VirtualTopic.t1*
> proper binding is created.
> Next I'm trying to connect second consumer with the same name:
> *Consumer.cons1.VirtualTopic.t2*
> But no binding is created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2103) VirtualTopic doesn't work correctly with multiple consumers

2018-10-02 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635574#comment-16635574
 ] 

Gary Tully commented on ARTEMIS-2103:
-

Why does the second consumer need the same name?

 

you could use a wildcard to consume from matching topics, 
*Consumer.cons1.VirtualTopic.**

> VirtualTopic doesn't work correctly with multiple consumers
> ---
>
> Key: ARTEMIS-2103
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2103
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, OpenWire
>Affects Versions: 2.6.3
>Reporter: Pawel
>Priority: Major
>
> It's impossibe to subscribe multiple virtual topics with the same consumer 
> name.
> I've configured acceptor like described in documentation:
> {code:java}
> tcp://0.0.0.0:61616?tcpSendBufferSize=1048576;tcpReceiveBufferSize=1048576;protocols=CORE,AMQP,STOMP,HORNETQ,MQTT,OPENWIRE;useEpoll=true;amqpCredits=1000;amqpLowCredits=300;virtualTopicConsumerWildcards=Consumer.*.%3E%3B2{code}
> When I'm connecting first consumer:  
> *Consumer.cons1.VirtualTopic.t1*
> proper binding is created.
> Next I'm trying to connect second consumer with the same name:
> *Consumer.cons1.VirtualTopic.t2*
> But no binding is created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7062) RedeliverPlugin can loop on duplicate detection sending to dlq

2018-09-26 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7062.
-
Resolution: Fixed

> RedeliverPlugin can loop on duplicate detection sending to dlq
> --
>
> Key: AMQ-7062
> URL: https://issues.apache.org/jira/browse/AMQ-7062
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> When brokers "RedeliveryPlugin" is configured with maximumRedeliveries="-1" 
> (deliver forever) and a "duplicate message from store" is detected, the 
> duplicate message never makes it to the DLQ and keeps getting redelivered to 
> the original queue.
> {code:java}
> WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
> gion.cursors.AbstractStoreCursor  116 | 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34228,lastRet:MessageOrderCursor:[def:0,
>  low:0, high:0],pending:0 - cursor got duplicate send 
> ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
> org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@26c9ffc9
> WARN  | JobScheduler:JMS | Queue| 
> mq.broker.region.BaseDestination  853 | duplicate message from store 
> ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
> TRACE | JobScheduler:JMS | RedeliveryPlugin | 
> emq.broker.util.RedeliveryPlugin  173 | redelivery #31514 of: 
> ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
> WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
> gion.cursors.AbstractStoreCursor  116 | 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34229,lastRet:MessageOrderCursor:[def:0,
>  low:0, high:0],pending:0 - cursor got duplicate send 
> ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
> org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@7be7d7d7
> WARN  | JobScheduler:JMS | Queue| 
> mq.broker.region.BaseDestination  853 | duplicate message from store 
> ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
> TRACE | JobScheduler:JMS | RedeliveryPlugin | 
> emq.broker.util.RedeliveryPlugin  173 | redelivery #31515 of: 
> ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
> WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
> gion.cursors.AbstractStoreCursor  116 | 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34230,lastRet:MessageOrderCursor:[def:0,
>  low:0, high:0],pending:0 - cursor got duplicate send 
> ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
> org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@17f4d783
> WARN  | JobScheduler:JMS | Queue| 
> mq.broker.region.BaseDestination  853 | duplicate message from store 
> ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
> TRACE | JobScheduler:JMS | RedeliveryPlugin | 
> emq.broker.util.RedeliveryPlugin  173 | redelivery #31516 of: 
> ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
> {code}
>  
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7062) RedeliverPlugin can loop on duplicate detection sending to dlq

2018-09-25 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7062:
---

 Summary: RedeliverPlugin can loop on duplicate detection sending 
to dlq
 Key: AMQ-7062
 URL: https://issues.apache.org/jira/browse/AMQ-7062
 Project: ActiveMQ
  Issue Type: Bug
  Components: Broker
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


When brokers "RedeliveryPlugin" is configured with maximumRedeliveries="-1" 
(deliver forever) and a "duplicate message from store" is detected, the 
duplicate message never makes it to the DLQ and keeps getting redelivered to 
the original queue.
{code:java}
WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
gion.cursors.AbstractStoreCursor  116 | 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34228,lastRet:MessageOrderCursor:[def:0,
 low:0, high:0],pending:0 - cursor got duplicate send 
ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@26c9ffc9
WARN  | JobScheduler:JMS | Queue| 
mq.broker.region.BaseDestination  853 | duplicate message from store 
ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
TRACE | JobScheduler:JMS | RedeliveryPlugin | 
emq.broker.util.RedeliveryPlugin  173 | redelivery #31514 of: 
ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
gion.cursors.AbstractStoreCursor  116 | 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34229,lastRet:MessageOrderCursor:[def:0,
 low:0, high:0],pending:0 - cursor got duplicate send 
ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@7be7d7d7
WARN  | JobScheduler:JMS | Queue| 
mq.broker.region.BaseDestination  853 | duplicate message from store 
ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
TRACE | JobScheduler:JMS | RedeliveryPlugin | 
emq.broker.util.RedeliveryPlugin  173 | redelivery #31515 of: 
ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
WARN  | JobScheduler:JMS | AbstractStoreCursor  | 
gion.cursors.AbstractStoreCursor  116 | 
org.apache.activemq.broker.region.cursors.QueueStorePrefetch@13044dcb:MYQUEUEXXX,batchResetNeeded=false,size=0,cacheEnabled=true,maxBatchSize:3,hasSpace:true,pendingCachedIds.size:1,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:ID:MYID2XXX-41992-1537343117867-33:1:1:1:31,lastAsyncCachedId-seq:34164,store=permits:,sd=nextSeq:34230,lastRet:MessageOrderCursor:[def:0,
 low:0, high:0],pending:0 - cursor got duplicate send 
ID:MYIDXXX1536652818855-25:1:2:1:4 seq: 
org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask$InnerFutureTask@17f4d783
WARN  | JobScheduler:JMS | Queue| 
mq.broker.region.BaseDestination  853 | duplicate message from store 
ID:MYIDXXX1536652818855-25:1:2:1:4, redirecting for dlq processing
TRACE | JobScheduler:JMS | RedeliveryPlugin | 
emq.broker.util.RedeliveryPlugin  173 | redelivery #31516 of: 
ID:MYIDXXX1536652818855-25:1:2:1:4 with delay: 1, dest: queue://MYQUEUEXXX
{code}
 
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7057) Suppress (optionally) warn logging of EOF or Reset exceptions when remote socket is closed

2018-09-20 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7057.
-
Resolution: Fixed

> Suppress (optionally) warn logging  of EOF or Reset exceptions when remote 
> socket is closed
> ---
>
> Key: AMQ-7057
> URL: https://issues.apache.org/jira/browse/AMQ-7057
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Transport
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> when a load-balancer or health check pings any transport connector endpoint 
> to verify that the broker is listening on a port; using socket.open/close, 
> any subsequent read failure is treated as an error and logged as a WARN.
>  This makes sense in general b/c it is indicative of a rogue client.
> However when it is the norm, ie: from a health check, then the logs get 
> filled with these worrying messages that are in fact expected.
>  For the somtp transport, where there is no protocol close method, we already 
> suppress EOF and connection reset exceptions. 
>  This improvement would make that the default behaviour for all tcp 
> transports and allow it to be enabled when required via configuration:
> {code:java}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7057) Suppress (optionally) warn logging of EOF or Reset exceptions when remote socket is closed

2018-09-20 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622087#comment-16622087
 ] 

Gary Tully commented on AMQ-7057:
-

I am defaulting to true to retain the stomp behaviour of not reporting, 
essentially extending that behaviour to openwire. Otherwise stomp users would 
need to flip the bit to get back to the status quo.

> Suppress (optionally) warn logging  of EOF or Reset exceptions when remote 
> socket is closed
> ---
>
> Key: AMQ-7057
> URL: https://issues.apache.org/jira/browse/AMQ-7057
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Transport
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> when a load-balancer or health check pings any transport connector endpoint 
> to verify that the broker is listening on a port; using socket.open/close, 
> any subsequent read failure is treated as an error and logged as a WARN.
>  This makes sense in general b/c it is indicative of a rogue client.
> However when it is the norm, ie: from a health check, then the logs get 
> filled with these worrying messages that are in fact expected.
>  For the somtp transport, where there is no protocol close method, we already 
> suppress EOF and connection reset exceptions. 
>  This improvement would make that the default behaviour for all tcp 
> transports and allow it to be enabled when required via configuration:
> {code:java}
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AMQ-7057) Suppress (optionally) warn logging of EOF or Reset exceptions when remote socket is closed

2018-09-20 Thread Gary Tully (JIRA)
Gary Tully created AMQ-7057:
---

 Summary: Suppress (optionally) warn logging  of EOF or Reset 
exceptions when remote socket is closed
 Key: AMQ-7057
 URL: https://issues.apache.org/jira/browse/AMQ-7057
 Project: ActiveMQ
  Issue Type: Improvement
  Components: Transport
Affects Versions: 5.15.0
Reporter: Gary Tully
Assignee: Gary Tully
 Fix For: 5.16.0


when a load-balancer or health check pings any transport connector endpoint to 
verify that the broker is listening on a port; using socket.open/close, any 
subsequent read failure is treated as an error and logged as a WARN.
 This makes sense in general b/c it is indicative of a rogue client.

However when it is the norm, ie: from a health check, then the logs get filled 
with these worrying messages that are in fact expected.
 For the somtp transport, where there is no protocol close method, we already 
suppress EOF and connection reset exceptions. 
 This improvement would make that the default behaviour for all tcp transports 
and allow it to be enabled when required via configuration:
{code:java}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7055) Small optimization on SequenceSet to prevent iterating through the whole set when a value bigger than the last value is added

2018-09-20 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7055.
-
   Resolution: Fixed
 Assignee: Gary Tully
Fix Version/s: 5.16.0

[~alanprot] thanks for the patch, this is a nice optimisation, great catch!

> Small optimization on SequenceSet to prevent iterating through the whole set 
> when a value bigger than the last value is added
> -
>
> Key: AMQ-7055
> URL: https://issues.apache.org/jira/browse/AMQ-7055
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.6
>Reporter: Alan Protasio
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> When the index file has a huge number of free pages and the broker is 
> starting up (loading the index) we end up in a O(n2) loop.
> This was causing the broker to use 100% cpu and not being able to start up 
> even after a long time (as i remember we had around 3 millions free page in 
> this case)
> [https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/page/PageFile.java#L428]
> https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/util/SequenceSet.java#L118
> I noticed that almost all the free pages was being added to the end of the 
> sequenceSet... So for any free page the broker had to necessity iterate 
> through  all the Set (and after doing that for nothing add . the value to the 
> tail).
> With this small change, the broker started up in less than 5 minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7056) Fix AMQ3625Test and JaasNetworkTest Tests

2018-09-20 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7056.
-
   Resolution: Fixed
 Assignee: Gary Tully
Fix Version/s: 5.16.0

> Fix AMQ3625Test and JaasNetworkTest Tests
> -
>
> Key: AMQ-7056
> URL: https://issues.apache.org/jira/browse/AMQ-7056
> Project: ActiveMQ
>  Issue Type: Test
>Reporter: Alan Protasio
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> Fix AMQ3625Test and JaasNetworkTest Tests.
> Those tests were failing due:
> {quote}{{Caused by: javax.net.ssl.SSLHandshakeException: 
> java.security.cert.CertificateException: No name matching \{HOST} found}}
> {{ at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)}}
> {{ at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)}}
> {{ at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)}}
> {{ at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)}}
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7028) Poor performance when concurrentStoreAndDispatchQueues + slow FS + Slow Consumers

2018-09-19 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620375#comment-16620375
 ] 

Gary Tully commented on AMQ-7028:
-

I pulled the mkahadb fix.

on the xa scenario, AMQ-6707 has ~8 commits, all of which are relevant but none 
of which are on the 5.15.x branch. It may be best to include all the fixes 
together, but this is obviously a larger change.


maybe try and pull in all the relevant changes and open a pr once the branch is 
good.
{code:java}
git log apache/master | grep AMQ-6707

    AMQ-6707 - fix destination filter delegate param, refactor-auto-gen method; 
jees

    AMQ-6707 - ensure entryLocator is used for rollback of prepared add to 
avoid NPE, relates to AMQ-5567

    AMQ-6707 - remove duplicated started state flag

    AMQ-6707 - ensure trace logging does not flip cacheEnabled flag outside 
required sync

    AMQ-6707 - fix trace log reporting in error

    AMQ-6707 - skip tracked ack dependent test for leveldb

    AMQ-6707 - mKahadb, track recovered tx per store for completion, resolve 
test regression

    AMQ-6707 - JDBC XA recovery and completion.{code}

> Poor performance when concurrentStoreAndDispatchQueues + slow FS + Slow 
> Consumers
> -
>
> Key: AMQ-7028
> URL: https://issues.apache.org/jira/browse/AMQ-7028
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.4
>Reporter: Alan Protasio
>Priority: Major
>
> Using high latency FS (as NFS) to store kahadb files and setting 
> concurrentStoreAndDispatchQueues=true may cause poor performance for slow 
> consumer. This happens because using this option makes activemq write the 
> produced messages one by one to the underlying file system (this is 
> implemented by using a SingleThread ExecutorService).
> Lets say that for each write to the FS takes 10ms and the queue has slow 
> consumers. In this case, does not matter the number of concurrent messages 
> the producers try to send to the queue, the maximum performance we can 
> achieve is 100 TPS. Tuning this flag off, we can see a really better 
> performance for sending messages in parallel as those messages can be batched 
> to the FS in a single write (the performance increases with the number of 
> concurrent messages being sent in parallel).
> Looking at Activemq code we found that there is an flag used on levelDb to 
> detect if the queue has fast or slow consumers, and decide if it will use 
> concurrentStoreAndDispach or not.
> https://issues.apache.org/jira/browse/AMQ-3750
> but this flag is not used on the KahaDb implementation.
> We made a code change to receive the flag in the KahaDbStore and use it to 
> decide if the message will be stored async or not.
> We think that there is no reason to try to "StoreAndDispatch" if the 
> destination has slow consumers. This only brings overhead and in case of high 
> latency FS, really poor performance when the queue has slow consumer.
> For fast consumers, this change will have no effect giving the better of the 
> 2 options.
> Some Results:
> Original Version:
> Fast Consumers:
> Producer
>  mean rate = 8248.50 calls/second
>  min = 0.42 milliseconds
>  max = 756.61 milliseconds
>  mean = 11.30 milliseconds
>  stddev = 44.05 milliseconds
>  median = 6.02 milliseconds
>  75% <= 9.79 milliseconds
>  95% <= 18.15 milliseconds
>  98% <= 27.71 milliseconds
>  99% <= 123.51 milliseconds
>  99.9% <= 756.61 milliseconds
> Slow consumers:
> Producer
>  mean rate = 84.29 calls/second
>  min = 86.27 milliseconds
>  max = 1467.53 milliseconds
>  mean = 1082.55 milliseconds
>  stddev = 154.04 milliseconds
>  median = 1075.94 milliseconds
>  75% <= 1169.10 milliseconds
>  95% <= 1308.90 milliseconds
>  98% <= 1350.85 milliseconds
>  99% <= 1363.61 milliseconds
>  99.9% <= 1466.67 milliseconds
> Patched Version:
> Fast Consumers:
> Producer
>  count = 890783
>  mean rate = 8099.33 calls/second
>  min = 0.47 milliseconds
>  max = 2259.10 milliseconds
>  mean = 13.90 milliseconds
>  stddev = 84.84 milliseconds
>  median = 5.00 milliseconds
>  75% <= 9.08 milliseconds
>  95% <= 15.66 milliseconds
>  98% <= 32.94 milliseconds
>  99% <= 355.52 milliseconds
>  99.9% <= 731.69 milliseconds
> Slow consumers:
> Producer
>  mean rate = 1732.25 calls/second
>  1-minute rate = 1811.80 calls/second
>  min = 17.52 milliseconds
>  max = 1249.54 milliseconds
>  mean = 50.95 milliseconds
>  stddev = 130.68 milliseconds
>  median = 28.73 milliseconds
>  75% <= 32.51 milliseconds
>  95% <= 57.04 milliseconds
>  98% <= 461.46 milliseconds
>  99% <= 937.87 milliseconds
>  99.9% <= 1249.48 milliseconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AMQ-7052) Fix tests related to purgeRecoveredXATransactions property on the KahaDB

2018-09-18 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7052.
-
   Resolution: Fixed
Fix Version/s: (was: 5.x)
   5.16.0

[~alanprot] Thanks for these fixes, much appreciated. 

> Fix tests related to purgeRecoveredXATransactions property on the KahaDB
> 
>
> Key: AMQ-7052
> URL: https://issues.apache.org/jira/browse/AMQ-7052
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Test Cases
>Affects Versions: 5.15.5, 5.15.6
>Reporter: Alan Protasio
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.16.0
>
>
> Fix the following tests:
> JdbcXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> JdbcXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeCommitOnRestart:391
>  null
> mKahaDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> mKahaDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeCommitOnRestart:391
>  null
> mLevelDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> Those tests are failing because this feature (purge transactions after 
> reboot) was only implemented on the KahaDBPersistenceAdapter.
> [https://github.com/apache/activemq/commit/ce7498c971b99e2515f07aab36418a1a0f19c03e]
> The tests are failing because the class XARecoveryBrokerTest is used to test 
> multiples adapters that are not implementing the same feature.
> Steps to reproduce:
> > git checkout [https://github.com/apache/activemq/tree/activemq-5.15.x]
> > cd activemq-unit-tests
> > mvn clean install -Dtest=JdbcXARecoveryBrokerTest
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AMQ-7052) Fix tests related to purgeRecoveredXATransactions property on the KahaDB

2018-09-18 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reassigned AMQ-7052:
---

Assignee: Gary Tully

> Fix tests related to purgeRecoveredXATransactions property on the KahaDB
> 
>
> Key: AMQ-7052
> URL: https://issues.apache.org/jira/browse/AMQ-7052
> Project: ActiveMQ
>  Issue Type: Test
>  Components: Test Cases
>Affects Versions: 5.15.5, 5.15.6
>Reporter: Alan Protasio
>Assignee: Gary Tully
>Priority: Minor
> Fix For: 5.x
>
>
> Fix the following tests:
> JdbcXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> JdbcXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeCommitOnRestart:391
>  null
> mKahaDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> mKahaDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeCommitOnRestart:391
>  null
> mLevelDBXARecoveryBrokerTest>CombinationTestSupport.runBare:107->XARecoveryBrokerTest.testPreparedTransactionRecoveredPurgeRollbackOnRestart:332
>  expected:<0> but was:<4>
> Those tests are failing because this feature (purge transactions after 
> reboot) was only implemented on the KahaDBPersistenceAdapter.
> [https://github.com/apache/activemq/commit/ce7498c971b99e2515f07aab36418a1a0f19c03e]
> The tests are failing because the class XARecoveryBrokerTest is used to test 
> multiples adapters that are not implementing the same feature.
> Steps to reproduce:
> > git checkout [https://github.com/apache/activemq/tree/activemq-5.15.x]
> > cd activemq-unit-tests
> > mvn clean install -Dtest=JdbcXARecoveryBrokerTest
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7028) Poor performance when concurrentStoreAndDispatchQueues + slow FS + Slow Consumers

2018-09-10 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608875#comment-16608875
 ] 

Gary Tully commented on AMQ-7028:
-

The first thing to figure is if the tests work ok on their own. Some some tests 
fail when run in a bunch for unknown reasons. it can be tricky to get to the 
bottom of those failures. This is a problem, but it will take some work to make 
them reliable.

If a test fails reliably when run in isolation, this is a problem b/c that test 
should have worked at some stage in the past. There is some issue with the test 
or some regression. These failures need to be treated as bugs till they get 
resolved by figuring out the root cause for the failure; issue with the test 
assertions/environment or with some code change. These need jiras to track.

if the test works when run in isolation, then doing a manual run will suffice, 
this is not ideal, but again, getting to the bottom of such a failure may be 
tricky and will take some time. 

In addition, the test may be ok on master, pointing to a problem on the branch.

The general rule is that prior to a release, all the tests should run on the 
mainline, ie: a full test cycle to completion. That *should* include the 
activemq.all profile.

 

To run in isolation:

{{> mvn clean install -Dtest=JdbcXARecoveryBrokerTest}}

 

 

> Poor performance when concurrentStoreAndDispatchQueues + slow FS + Slow 
> Consumers
> -
>
> Key: AMQ-7028
> URL: https://issues.apache.org/jira/browse/AMQ-7028
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: KahaDB
>Affects Versions: 5.15.4
>Reporter: Alan Protasio
>Priority: Major
>
> Using high latency FS (as NFS) to store kahadb files and setting 
> concurrentStoreAndDispatchQueues=true may cause poor performance for slow 
> consumer. This happens because using this option makes activemq write the 
> produced messages one by one to the underlying file system (this is 
> implemented by using a SingleThread ExecutorService).
> Lets say that for each write to the FS takes 10ms and the queue has slow 
> consumers. In this case, does not matter the number of concurrent messages 
> the producers try to send to the queue, the maximum performance we can 
> achieve is 100 TPS. Tuning this flag off, we can see a really better 
> performance for sending messages in parallel as those messages can be batched 
> to the FS in a single write (the performance increases with the number of 
> concurrent messages being sent in parallel).
> Looking at Activemq code we found that there is an flag used on levelDb to 
> detect if the queue has fast or slow consumers, and decide if it will use 
> concurrentStoreAndDispach or not.
> https://issues.apache.org/jira/browse/AMQ-3750
> but this flag is not used on the KahaDb implementation.
> We made a code change to receive the flag in the KahaDbStore and use it to 
> decide if the message will be stored async or not.
> We think that there is no reason to try to "StoreAndDispatch" if the 
> destination has slow consumers. This only brings overhead and in case of high 
> latency FS, really poor performance when the queue has slow consumer.
> For fast consumers, this change will have no effect giving the better of the 
> 2 options.
> Some Results:
> Original Version:
> Fast Consumers:
> Producer
>  mean rate = 8248.50 calls/second
>  min = 0.42 milliseconds
>  max = 756.61 milliseconds
>  mean = 11.30 milliseconds
>  stddev = 44.05 milliseconds
>  median = 6.02 milliseconds
>  75% <= 9.79 milliseconds
>  95% <= 18.15 milliseconds
>  98% <= 27.71 milliseconds
>  99% <= 123.51 milliseconds
>  99.9% <= 756.61 milliseconds
> Slow consumers:
> Producer
>  mean rate = 84.29 calls/second
>  min = 86.27 milliseconds
>  max = 1467.53 milliseconds
>  mean = 1082.55 milliseconds
>  stddev = 154.04 milliseconds
>  median = 1075.94 milliseconds
>  75% <= 1169.10 milliseconds
>  95% <= 1308.90 milliseconds
>  98% <= 1350.85 milliseconds
>  99% <= 1363.61 milliseconds
>  99.9% <= 1466.67 milliseconds
> Patched Version:
> Fast Consumers:
> Producer
>  count = 890783
>  mean rate = 8099.33 calls/second
>  min = 0.47 milliseconds
>  max = 2259.10 milliseconds
>  mean = 13.90 milliseconds
>  stddev = 84.84 milliseconds
>  median = 5.00 milliseconds
>  75% <= 9.08 milliseconds
>  95% <= 15.66 milliseconds
>  98% <= 32.94 milliseconds
>  99% <= 355.52 milliseconds
>  99.9% <= 731.69 milliseconds
> Slow consumers:
> Producer
>  mean rate = 1732.25 calls/second
>  1-minute rate = 1811.80 calls/second
>  min = 17.52 milliseconds
>  max = 1249.54 milliseconds
>  mean = 50.95 milliseconds
>  stddev = 130.68 milliseconds
>  median = 28.73 milliseconds
>  75% <= 32.51 milliseconds
>  95% <= 57.04 milliseconds
>  98% <= 461.46 milliseconds
>  99% 

[jira] [Commented] (AMQ-7041) Failed to load message under load in GlusterFS setup

2018-08-27 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593766#comment-16593766
 ] 

Gary Tully commented on AMQ-7041:
-

Maybe check what are the default performance options. the store expects a sync 
write to be visible to another thread doing a read on the same location, 
through a different file descriptor.

any write behind options would get in the way. In my case, a subsequent read 
would work ok, which pointed to a caching issue.

using strace of file io may help diagnose.

> Failed to load message under load in GlusterFS setup
> 
>
> Key: AMQ-7041
> URL: https://issues.apache.org/jira/browse/AMQ-7041
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.2
> Environment: Two nodes master and slave.
> GlusterFS is configured to replicate the journal across nodes. Only one node 
> is active and writing.
>  
> Relevant from activemq.xml
> {noformat}
> 
>     
>     
>       memoryLimit="40mb"
>  producerFlowControl="true"
>  optimizedDispatch="true">
>     
>       memoryLimit="600mb"
>  producerFlowControl="false"
>  optimizedDispatch="true">
>     
>     
>     
>     
> 
>          enableJournalDiskSyncs="true"
>     ignoreMissingJournalfiles="true"
>     checkForCorruptJournalFiles="true"
>     checksumJournalFiles="true">
>     
>     
>     
>     
> {noformat}
>Reporter: Mark
>Priority: Major
>
> Hi,
>  
> I am trying to solve the issue that with ActiveMQ in GlusterFS setup. I tend 
> to blame GlusterFS because without it, ActiveMQ performs well under the same 
> load. However, I would like to understand why it might fail with GlusterFS or 
> maybe some parts could be tuned to play nicely. Another point to mention is 
> that GlusterFS doesn't report any errors whatsoever when this issue happens. 
> Would be great to understand what the error is about.
> There is not way to reproduce it with 100% probability, however there are 
> steps that are likely to cause the issue:
>  # add several (10+) consumers for a queue
>  # add several (100+) async producers to the queue to produce very high load 
> spike around 50-100k msg/s
>  
> {noformat}
> 2018-08-23 16:24:08,506 | ERROR | Failed to load message at: 181:32956327 | 
> org.apache.activemq.store.kahadb.KahaDBStore | ActiveMQ BrokerService[AMQ01] 
> Task-236
> java.io.IOException: Unexpected error on journal read at: 181:32956327
>     at 
> org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:28)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1260)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:594)
>     at 
> org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:583)
>     at 
> org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:110)
>     at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:127)
>     at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:448)
>     at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:168)
>     at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:169)
>     at 
> org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1981)
>     at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2210)
>     at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1644)
>     at 
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
>     at 
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Could not load journal record, unexpected 
> command type: KAHA_TRACE_COMMAND at location: 181:32956327
>     at 
> 

[jira] [Commented] (AMQ-7041) Failed to load message under load in GlusterFS setup

2018-08-23 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590433#comment-16590433
 ] 

Gary Tully commented on AMQ-7041:
-

I have seen this happen and it seems it is the gluster
performance.read-ahead
translator that is at fault.  When the cache is disabled, dispatch can quickly 
read from a recently updated journal location and get caught with invalid data 
due to read-ahead. Try running with that gluster option disabled and see if it 
is the same issue.

 

> Failed to load message under load in GlusterFS setup
> 
>
> Key: AMQ-7041
> URL: https://issues.apache.org/jira/browse/AMQ-7041
> Project: ActiveMQ
>  Issue Type: Bug
>Affects Versions: 5.15.2
> Environment: Two nodes master and slave.
> GlusterFS is configured to replicate the journal across nodes. Only one node 
> is active and writing.
>  
> Relevant from activemq.xml
> {noformat}
> 
>     
>     
>       memoryLimit="40mb"
>  producerFlowControl="true"
>  optimizedDispatch="true">
>     
>       memoryLimit="600mb"
>  producerFlowControl="false"
>  optimizedDispatch="true">
>     
>     
>     
>     
> 
>          enableJournalDiskSyncs="true"
>     ignoreMissingJournalfiles="true"
>     checkForCorruptJournalFiles="true"
>     checksumJournalFiles="true">
>     
>     
>     
>     
> {noformat}
>Reporter: Mark
>Priority: Major
>
> Hi,
>  
> I am trying to solve the issue that with ActiveMQ in GlusterFS setup. I tend 
> to blame GlusterFS because without it, ActiveMQ performs well under the same 
> load. However, I would like to understand why it might fail with GlusterFS or 
> maybe some parts could be tuned to play nicely. Another point to mention is 
> that GlusterFS doesn't report any errors whatsoever when this issue happens. 
> Would be great to understand what the error is about.
> There is not way to reproduce it with 100% probability, however there are 
> steps that are likely to cause the issue:
>  # add several (10+) consumers for a queue
>  # add several (100+) async producers to the queue to produce very high load 
> spike around 50-100k msg/s
>  
> {noformat}
> 2018-08-23 16:24:08,506 | ERROR | Failed to load message at: 181:32956327 | 
> org.apache.activemq.store.kahadb.KahaDBStore | ActiveMQ BrokerService[AMQ01] 
> Task-236
> java.io.IOException: Unexpected error on journal read at: 181:32956327
>     at 
> org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:28)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1260)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:594)
>     at 
> org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:583)
>     at 
> org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:110)
>     at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:127)
>     at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:448)
>     at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:168)
>     at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:169)
>     at 
> org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1981)
>     at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2210)
>     at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1644)
>     at 
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
>     at 
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Could not load journal record, unexpected 
> command type: KAHA_TRACE_COMMAND at location: 181:32956327
>     at 
> org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1252)
> 

[jira] [Resolved] (AMQ-7037) Add sslContext attribute to networkConnector to override broker sslContext

2018-08-20 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7037.
-
Resolution: Fixed

{code:java}


  

 {code}


> Add sslContext attribute to networkConnector to override broker sslContext
> --
>
> Key: AMQ-7037
> URL: https://issues.apache.org/jira/browse/AMQ-7037
> Project: ActiveMQ
>  Issue Type: Improvement
>  Components: Broker, networkbridge
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0
>
>
> Allow each network connector to specify the sslContext it needs in cases 
> where this needs to be different from the brokerContext which is used for 
> acceptors or from the jvm ssl defaults.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    3   4   5   6   7   8   9   10   11   12   >