[jira] [Created] (ARTEMIS-2165) Not having backup available after restart in colocated configuration

2018-11-06 Thread Dmitry Smeliansky (JIRA)
Dmitry Smeliansky created ARTEMIS-2165:
--

 Summary: Not having backup available after restart in colocated 
configuration
 Key: ARTEMIS-2165
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2165
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.6.0
Reporter: Dmitry Smeliansky


This issue was asked multiple times in stackoverflow, but was never answered.

When using a co-located HA policy, the backups, maintained by some node are not 
restored after this node is restarted (not on this node and not on any other).

[https://stackoverflow.com/questions/52839706/artemis-2-6-0-colocated-ha-issues]

[https://stackoverflow.com/questions/52000662/artemis-colocated-ha-policy]

Our configuration was as the following:
{code:java}
   

  
100
true
2
-1

5000


  

  {code}
Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARTEMIS-1850) QueueControl.listDeliveringMessages returns empty result

2018-11-06 Thread Howard Gao (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Howard Gao closed ARTEMIS-1850.
---
   Resolution: Fixed
Fix Version/s: (was: 2.7.0)
   2.6.4

> QueueControl.listDeliveringMessages returns empty result
> 
>
> Key: ARTEMIS-1850
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1850
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: AMQP
>Affects Versions: 2.5.0
>Reporter: Howard Gao
>Assignee: Howard Gao
>Priority: Major
> Fix For: 2.6.4
>
>
> With AMQP protocol when some messages are received in a transaction, calling 
> JMX QueueControl.listDeliveringMessages() returns empty list before the 
> transaction is committed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1850) QueueControl.listDeliveringMessages returns empty result

2018-11-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677583#comment-16677583
 ] 

ASF subversion and git services commented on ARTEMIS-1850:
--

Commit 72eadb201d870b097c8659497823f27bf2401d6f in activemq-artemis's branch 
refs/heads/master from [~gaohoward]
[ https://git-wip-us.apache.org/repos/asf?p=activemq-artemis.git;h=72eadb2 ]

ARTEMIS-1850 QueueControl.listDeliveringMessages returns empty result

With AMQP protocol when some messages are received in a transaction,
calling JMX QueueControl.listDeliveringMessages() returns empty list
before the transaction is committed.


> QueueControl.listDeliveringMessages returns empty result
> 
>
> Key: ARTEMIS-1850
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1850
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: AMQP
>Affects Versions: 2.5.0
>Reporter: Howard Gao
>Assignee: Howard Gao
>Priority: Major
> Fix For: 2.7.0
>
>
> With AMQP protocol when some messages are received in a transaction, calling 
> JMX QueueControl.listDeliveringMessages() returns empty list before the 
> transaction is committed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1850) QueueControl.listDeliveringMessages returns empty result

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677584#comment-16677584
 ] 

ASF GitHub Bot commented on ARTEMIS-1850:
-

Github user asfgit closed the pull request at:

https://github.com/apache/activemq-artemis/pull/2074


> QueueControl.listDeliveringMessages returns empty result
> 
>
> Key: ARTEMIS-1850
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1850
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: AMQP
>Affects Versions: 2.5.0
>Reporter: Howard Gao
>Assignee: Howard Gao
>Priority: Major
> Fix For: 2.7.0
>
>
> With AMQP protocol when some messages are received in a transaction, calling 
> JMX QueueControl.listDeliveringMessages() returns empty list before the 
> transaction is committed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677379#comment-16677379
 ] 

Alan Protasio edited comment on AMQ-7082 at 11/7/18 2:15 AM:
-

[~gtully] [~cshannon]

Sorry guys... but there is one more case that this can happen... 

Imagine that you have a 1000 page in the file...

T1 -> 1000 occupied pages

T2 -> page 100 is freed (and so added to the freeList)

T3 -> Background thread see the 100 page as free and so, add to the recoveryList

T3 -> We need one more page and we allocate page 100 (as it was marked as free 
in the freeList).

T4 -> background thread finish and mark 100 as free again...

I did a PR there to prevent this... I did everything in a way that we dont need 
synchronization... 

I think now we pretty much got all the cases... 

 


was (Author: alanprot):
[~gtully] [~cshannon]

Sorry guys... but there is one more case that this can happen... 

Imagine that you have a 1000 page in the file...

T1 -> 1000 occupied pages

T2 -> page 100 is freed (and so added to the freeList)

T3 -> Background thread see the 100 page as free and so, add to the recoveryList

T3 -> We need one more page and we allocate page 100 (its not occupied).

T4 -> background thread finish and mark 100 as free again...

I did a PR there to prevent this... I did everything in a way that we dont need 
synchronization... 

 

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677471#comment-16677471
 ] 

Alan Protasio commented on AMQ-7091:


Hi Guys

 

I changed my PR to use the SequenceSets that already exists in the index...

[https://github.com/apache/activemq/pull/315]

I'm running all the tests now...

 

Thanks

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677379#comment-16677379
 ] 

Alan Protasio commented on AMQ-7082:


[~gtully] [~cshannon]

Sorry guys... but there is one more case that this can happen... 

Imagine that you have a 1000 page in the file...

T1 -> 1000 occupied pages

T2 -> page 100 is freed (and so added to the freeList)

T3 -> Background thread see the 100 page as free and so, add to the recoveryList

T3 -> We need one more page and we allocate page 100 (its not occupied).

T4 -> background thread finish and mark 100 as free again...

I did a PR there to prevent this... I did everything in a way that we dont need 
synchronization... 

 

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677374#comment-16677374
 ] 

ASF GitHub Bot commented on AMQ-7082:
-

GitHub user alanprot opened a pull request:

https://github.com/apache/activemq/pull/317

AMQ-7082 We should make sure that pages that is being recovered canno…

…t be reused

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanprot/activemq AMQ-7082-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq/pull/317.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #317


commit af4b52a426f35c3c01058c5f49e40baa9fd6d771
Author: Alan Protasio 
Date:   2018-11-06T22:28:32Z

AMQ-7082 We should make sure that pages that is being recovered cannot be 
reused




> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677319#comment-16677319
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2418
  
@gemmellr this should sort the failures you're seeing out.


> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677314#comment-16677314
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

GitHub user michaelandrepearce opened a pull request:

https://github.com/apache/activemq-artemis/pull/2418

ARTEMIS-2142 Fix ServerJMSMessage

Fix ServerJMSMessage so it correctly transposes JMSX headers.
Push common JMS to Message Interface mappings into MessageUtil

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelandrepearce/activemq-artemis 
ARTEMIS-2142-EXTRA

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/2418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2418


commit 8adcfd4478f4338bba0799cf4e42a2b935ce9bd1
Author: Michael André Pearce 
Date:   2018-11-06T21:30:52Z

ARTEMIS-2142 Fix ServerJMSMessage

Fix ServerJMSMessage so it correctly transposes JMSX headers.
Push common JMS to Message Interface mappings into MessageUtil




> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677287#comment-16677287
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2387
  
@gemmellr ok debugging through to see what it could be.


> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677153#comment-16677153
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

Github user gemmellr commented on the issue:

https://github.com/apache/activemq-artemis/pull/2387
  
The 3 tests that failed earlier (in multiple envs) pass with this reverted, 
so it does seem related beyond the original coincidence of changing the same 
bits.


> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677145#comment-16677145
 ] 

Alan Protasio commented on AMQ-7091:


Cool! Thanks guys to looking into that...

I will change my PR! :D

 

Thanks again!!! :D

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677076#comment-16677076
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2387
  
 Gemmellr please re check if this is actually related to this change. These 
passed when manually run on the branch


> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2142) Support JMSXGroupSeq -1 to close/reset Groups

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677032#comment-16677032
 ] 

ASF GitHub Bot commented on ARTEMIS-2142:
-

Github user gemmellr commented on the issue:

https://github.com/apache/activemq-artemis/pull/2387
  
There are integration test failures occurring on master recently due to 
some issue or change with JMSXGroupSeq handling, looks to be for 
Openwire-->AMQP tests:

> JMSLVQTest.testLVQOpenWireProducerAMQPConsumer
JMSSelectorTest.testJMSSelectorsOpenWireProducerAMQPConsumer
 JMSWebSocketConnectionTest.testSendLargeMessageToClientFromOpenWire

It may just be a coincidence that this recent change touched that area, I 
haven't checked further to see.


> Support JMSXGroupSeq -1 to close/reset Groups
> -
>
> Key: ARTEMIS-2142
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2142
> Project: ActiveMQ Artemis
>  Issue Type: Sub-task
>Reporter: Michael Andre Pearce
>Assignee: Michael Andre Pearce
>Priority: Major
>
> Support Closing groups as per ActiveMQ 5 , 
> [http://activemq.apache.org/message-groups.html] using -1 on groupsequence to 
> as the signal



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Jamie goodyear (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677021#comment-16677021
 ] 

Jamie goodyear commented on AMQ-7091:
-

Hi [~alanprot], 

After some conversation with the community, the SeuqnceSet in AMQ is good to 
use for this scenario, because then we can control the sizes and toss them when 
memory is short: 
https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/util/SequenceSet.java

Please have a look at the above, and lets update your PR.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1710) Allow for management messages to exceed global-max-size limit

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676958#comment-16676958
 ] 

ASF GitHub Bot commented on ARTEMIS-1710:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2414#discussion_r231191592
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/persistence/impl/journal/AbstractJournalStorageManager.java
 ---
@@ -1244,6 +1244,9 @@ private static PageSubscription 
locateSubscription(final long queueID,
  if (queueInfo != null) {
 SimpleString address = queueInfo.getAddress();
 PagingStore store = pagingManager.getPageStore(address);
--- End diff --

Might be worth adding some logic to clean this up automatically. Otherwise 
lots of users probably wont know or might miss it if its a release note.


> Allow for management messages to exceed global-max-size limit
> -
>
> Key: ARTEMIS-1710
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1710
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Ulf Lilleengen
>Assignee: Francesco Nigro
>Priority: Major
>
> Use case: global-max-size is set to some limit to prevent the broker from 
> falling over. The broker is configured with N queues which are all blocked by 
> this limit.
> If this limit is reached, however, it is not possible to perform management 
> operations on the broker, so you're stuck.
>  
> It should be possible to have an address like 'activemq.management' bypass 
> this limit so that a broker can be recovered when the global-max-size is 
> reached.
>  
> To work around the problem, an external component needs to ensure that all 
> addresses created have a max-size-bytes set so that worst case, there is some 
> room left for 'activemq.management' address. In this case the broker 
> configuration needs to be known by the component creating addresses which is 
> impractical and it feels like this problem would be easier to solve inside 
> the broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2124) Addresses/Queues are not auto-deleted in clusters

2018-11-06 Thread Justin Bertram (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676918#comment-16676918
 ] 

Justin Bertram commented on ARTEMIS-2124:
-

[~johan1], yes it should.  I modified your test a bit to make it a bit more 
efficient and also to fit it with the way things are done in the Artemis 
test-suite.  Everything works as expected (including the removal of the queue 
on broker B after the consumer disconnects).  Here's my modified test:

{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.activemq.artemis.tests.integration.client;

import javax.jms.JMSContext;
import java.util.Arrays;

import org.apache.activemq.artemis.api.core.BroadcastGroupConfiguration;
import org.apache.activemq.artemis.api.core.DiscoveryGroupConfiguration;
import org.apache.activemq.artemis.api.core.RoutingType;
import org.apache.activemq.artemis.api.core.SimpleString;
import org.apache.activemq.artemis.api.core.UDPBroadcastEndpointFactory;
import org.apache.activemq.artemis.core.config.ClusterConnectionConfiguration;
import org.apache.activemq.artemis.core.config.impl.ConfigurationImpl;
import org.apache.activemq.artemis.core.server.ActiveMQServer;
import org.apache.activemq.artemis.core.server.ActiveMQServers;
import org.apache.activemq.artemis.core.server.cluster.ClusterConnection;
import 
org.apache.activemq.artemis.core.server.cluster.impl.MessageLoadBalancingType;
import org.apache.activemq.artemis.core.settings.impl.AddressSettings;
import org.apache.activemq.artemis.junit.Wait;
import org.apache.activemq.artemis.tests.util.ActiveMQTestBase;
import org.apache.qpid.jms.JmsConnectionFactory;
import org.junit.Before;
import org.junit.FixMethodOrder;
import org.junit.Test;
import org.junit.runners.MethodSorters;

@FixMethodOrder(MethodSorters.NAME_ASCENDING)
public class Artemis2124_AutoDeleteTest extends ActiveMQTestBase {

   private ActiveMQServer brokerA;
   private final int brokerAPort = 5671;
   private ActiveMQServer brokerB;
   private final int brokerBPort = 5672;
   private final SimpleString queueName = new SimpleString("myQueue");

   private ActiveMQServer createClusterMember(final int port) throws Exception {
  final ActiveMQServer broker = 
addServer(ActiveMQServers.newActiveMQServer(new ConfigurationImpl()

   .setPersistenceEnabled(false)

   .setSecurityEnabled(false)

   .addAcceptorConfiguration("test", "tcp://localhost:" + port)


   /*

* queue config

*/

   .addAddressesSetting(queueName.toString(), new AddressSettings()

  .setRedistributionDelay(0)

  .setDefaultMaxConsumers(-1)

  .setAutoCreateAddresses(true)

  .setAutoCreateQueues(true)

  .setAutoDeleteAddresses(true)

  .setAutoCreateQueues(true)

  .setDefaultPurgeOnNoConsumers(false)

  .setDefaultAddressRoutingType(RoutingType.ANYCAST)

  

[jira] [Resolved] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully resolved AMQ-7082.
-
Resolution: Fixed

Fix is good, thanks [~alanprot]

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Jamie goodyear (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676861#comment-16676861
 ] 

Jamie goodyear commented on AMQ-7091:
-

Thanks Gary.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Jamie goodyear (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676857#comment-16676857
 ] 

Jamie goodyear commented on AMQ-7091:
-

[~cshannon], [~gtully]

I've reviewed Alan's patch and it looks good.

The bump in KahaDB version to 7 on Master would open the door to us doing some 
more changes to KahaDB/MessageDatabase (perhaps some gentle refactoring). 

I'd like to know what you think.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676855#comment-16676855
 ] 

Gary Tully commented on AMQ-7091:
-

that info does not need to be persisted, it is duplicate info. I would like to 
understand the tradeoff - that in mem reference map was introduced as on 
optimisation via: https://issues.apache.org/jira/browse/AMQ-3467 in 2011 but is 
in memory only to be fast. It may be possible to squash that map to a 
sequenceSet of offsets that share a reference count which would address the mem 
usage without the need to persist to disk.

 

The implications of additional page writes on every add/ack may be significant 
for the simple fanout (when users keep up) case. Typically the broker has 
plenty of memory for this sort of thing.

 

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7091) O(n) Memory consumption when broker has inactive durable subscribes causing OOM

2018-11-06 Thread Jamie goodyear (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676851#comment-16676851
 ] 

Jamie goodyear commented on AMQ-7091:
-

The patch looks good to me - thank you Alan.

> O(n) Memory consumption when broker has inactive durable subscribes causing 
> OOM
> ---
>
> Key: AMQ-7091
> URL: https://issues.apache.org/jira/browse/AMQ-7091
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.7
>Reporter: Alan Protasio
>Priority: Major
> Attachments: After.png, Before.png, 
> InactiveDurableSubscriberTest.java, memoryAllocation.jpg
>
>
> Hi :D
> One of our brokers was bouncing indefinitely due OOM even though the load 
> (TPS) was pretty low.
> Getting the memory dump I could see that almost 90% of the memory was being 
> used by 
> [messageReferences|https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/MessageDatabase.java#L2368]
>  TreeMap to keep track of what messages were already acked by all Subscribes 
> in order to delete them.
> This seems to be a problem as if the broker has an inactive durable 
> subscribe, the memory footprint increase proportionally (O) with the number 
> of messages sent to the topic in question, causing the broker to die due OOM 
> sooner or later (the high memory footprint continue even after a restart).
> You can find attached (memoryAllocation.jpg) a screen shot showing my broker 
> using 90% of the memory to keep track of those messages, making it barely 
> usable.
> Looking at the code, I could do a change to change the messageReferences to 
> use a BTreeIndex:
> final TreeMap messageReferences = new TreeMap<>();
>  + BTreeIndex messageReferences;
> Making this change, the memory allocation of the broker stabilized and the 
> broker didn't run OOM anymore.
> Attached you can see the code that I used to reproduce this scenario, also 
> the memory utilization (HEAP and GC graphs) before and after this change.
> Before the change the broker died in 5 minutes and I could send 48. After 
> then change the broker was still pretty healthy after 5 minutes and i could 
> send 2265000 to the topic (almost 5x more due high GC pauses).
>  
> All test are passing: mvn clean install -Dactivemq.tests=all



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676845#comment-16676845
 ] 

ASF subversion and git services commented on AMQ-7082:
--

Commit 505200b92a0d72666b6665f6026a97a8f215d895 in activemq's branch 
refs/heads/activemq-5.15.x from Christopher L. Shannon (cshannon)
[ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=505200b ]

AMQ-7082 - fix compilation after merge


> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Christopher L. Shannon (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676765#comment-16676765
 ] 

Christopher L. Shannon commented on AMQ-7082:
-

[~gtully] - I merged the fix can you take a quick look to make sure it looks 
good to you before we close this again?

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676764#comment-16676764
 ] 

Alan Protasio commented on AMQ-7082:


Thanks [~cshannon] :D:D:D

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676763#comment-16676763
 ] 

ASF subversion and git services commented on AMQ-7082:
--

Commit 45d7676bd95a7e4594a80b025994ea3242c94586 in activemq's branch 
refs/heads/activemq-5.15.x from [~alanprot]
[ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=45d7676 ]

AMQ-7082 - Make sure that the recovery will only mark pages as free if they 
were created in a previous execution

(cherry picked from commit 0d34338919edee863bd71693ee30999d9d9d6ce9)


> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676759#comment-16676759
 ] 

ASF subversion and git services commented on AMQ-7082:
--

Commit 0d34338919edee863bd71693ee30999d9d9d6ce9 in activemq's branch 
refs/heads/master from [~alanprot]
[ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=0d34338 ]

AMQ-7082 - Make sure that the recovery will only mark pages as free if they 
were created in a previous execution


> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676761#comment-16676761
 ] 

ASF GitHub Bot commented on AMQ-7082:
-

Github user asfgit closed the pull request at:

https://github.com/apache/activemq/pull/316


> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676760#comment-16676760
 ] 

ASF subversion and git services commented on AMQ-7082:
--

Commit 81062fde88eeb5fa970e70fd851b49d442a7116a in activemq's branch 
refs/heads/master from Christopher L. Shannon (cshannon)
[ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=81062fd ]

Merge branch 'AMQ-7082'

This closes #316

Thanks to Alan Protasio for the patch


> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Christopher L. Shannon (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676744#comment-16676744
 ] 

Christopher L. Shannon commented on AMQ-7082:
-

[~alanprot] - this looks good to me, I will merge it, thanks.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Christopher L. Shannon (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher L. Shannon updated AMQ-7082:

Fix Version/s: (was: 5.15.7)
   5.15.8

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.8
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Alan Protasio (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676723#comment-16676723
 ] 

Alan Protasio commented on AMQ-7082:


Hi guys...

I just created a PR with the tests...

Thanks! :D

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676718#comment-16676718
 ] 

ASF GitHub Bot commented on AMQ-7082:
-

GitHub user alanprot opened a pull request:

https://github.com/apache/activemq/pull/316

AMQ-7082 - Make sure that the recovery will only mark pages as free i…

…f they were created in a previous execution

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/alanprot/activemq AMQ-7082

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq/pull/316.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #316


commit 0d34338919edee863bd71693ee30999d9d9d6ce9
Author: Alan Protasio 
Date:   2018-11-06T09:13:18Z

AMQ-7082 - Make sure that the recovery will only mark pages as free if they 
were created in a previous execution




> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1850) QueueControl.listDeliveringMessages returns empty result

2018-11-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676711#comment-16676711
 ] 

ASF GitHub Bot commented on ARTEMIS-1850:
-

Github user gaohoward commented on the issue:

https://github.com/apache/activemq-artemis/pull/2074
  
@clebertsuconic Can you merge it? thx


> QueueControl.listDeliveringMessages returns empty result
> 
>
> Key: ARTEMIS-1850
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1850
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: AMQP
>Affects Versions: 2.5.0
>Reporter: Howard Gao
>Assignee: Howard Gao
>Priority: Major
> Fix For: 2.7.0
>
>
> With AMQP protocol when some messages are received in a transaction, calling 
> JMX QueueControl.listDeliveringMessages() returns empty list before the 
> transaction is committed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Alvin Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676687#comment-16676687
 ] 

Alvin Lin commented on AMQ-7082:


[~cshannon] I do believe the issue is kind of critical (thus warrants 5.15.8 
update soon) as it will lead to many mysterious behavior.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Christopher L. Shannon (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676683#comment-16676683
 ] 

Christopher L. Shannon commented on AMQ-7082:
-

[~gtully] -  is the issue that was found something that is critical (ie we have 
to release a 5.15.8 update soon) or could it wait for more fixes before a new 
release?

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676576#comment-16676576
 ] 

Gary Tully commented on AMQ-7082:
-

The window is small, however with something like message expiry, and all 
messages expiring while the broker is down, there can be alot of page file 
churn immediately after start. In addition if the recovery is very long, the 
window gets larger.

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (AMQ-7082) KahaDB index, recover free pages in parallel with start

2018-11-06 Thread Gary Tully (JIRA)


 [ 
https://issues.apache.org/jira/browse/AMQ-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Tully reopened AMQ-7082:
-

[~alanprot] pointed out an issue that can cause the concurrent free page 
recovery to walk on newly allocated and freed pages or pages in the process of 
being used.

The recovery processing needs to terminate at the nextFreePage id that exists 
at start. Everything after that is not recoverable, it is in use!

> KahaDB index, recover free pages in parallel with start
> ---
>
> Key: AMQ-7082
> URL: https://issues.apache.org/jira/browse/AMQ-7082
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: KahaDB
>Affects Versions: 5.15.0
>Reporter: Gary Tully
>Assignee: Gary Tully
>Priority: Major
> Fix For: 5.16.0, 5.15.7
>
>
> AMQ-6590 fixes free page loss through recovery. The recover process can be 
> timely, which prevents fast failover, doing recovery on shutdown is 
> preferable, but it is still not ideal b/c it will hold onto the kahadb lock. 
> It also can stall shutdown unexpectedly.
> AMQ-7080 is going to tackle checkpointing the free list. This should help 
> avoid the need for recovery but it may still be necessary. If the perf hit is 
> significant this may need to be optional.
> There will still be the need to walk the index to find the free list.
> It is possible to run with no free list and grow, and we can do that while we 
> recover the free list in parallel, then merge the two at a safe point. This 
> we can do at startup.
> In cases where the disk is the bottleneck this won't help much, but it will 
> help failover and it will help shutdown, with a bit of luck the recovery will 
> complete before we stop.
>  
> Initially I thought this would be too complex, but if we concede some growth 
> while we recover, ie: start with an empty free list, it is should be straight 
> forward to merge with a recovered one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AMQ-7085) Queue.start() does not call systemUsage.start() so TempStore usage is not handled correctly

2018-11-06 Thread Gary Tully (JIRA)


[ 
https://issues.apache.org/jira/browse/AMQ-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676375#comment-16676375
 ] 

Gary Tully commented on AMQ-7085:
-

[~sits] even for a small change, it is best to include a test case to validate 
the use case and to protect your change into the future. It also makes it easy 
to integrate b/c the bug becomes trivial to understand with the test case in 
code.

> Queue.start() does not call systemUsage.start() so TempStore usage is not 
> handled correctly
> ---
>
> Key: AMQ-7085
> URL: https://issues.apache.org/jira/browse/AMQ-7085
> Project: ActiveMQ
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 5.15.6
>Reporter: David Sitsky
>Priority: Major
> Attachments: AMQ-7085.patch
>
>
> I have an application using ActiveMQ and have a situation where a producer is 
> blocked with this log message due to the temp usage limit being hit:
> {noformat}
> [ActiveMQ Transport: tcp:///aaa.bbb.ccc.ddd:65119@64759] 82567721 INFO  
> org.apache.activemq.broker.region.Queue - 
> Usage(Main:temp:queue://aaabbb:temp) percentUsage=99%, usage=61771109932, 
> limit=61771104256, percentUsageMinDelta=1%;Parent:Usage(Main:temp) 
> percentUsage=100%, usage=61771109932, limit=61771104256, 
> percentUsageMinDelta=1%: Temp Store is Full (99% of 61771104256). Stopping 
> producer (ID:a-3:1:1:1) to prevent flooding queue://aaabbb. See 
> http://activemq.apache.org/producer-flow-control.html for more info (blocking 
> for: 8512s){noformat}
> In the past I have been able to use JConsole and update the broker's 
> TempLimit value to a higher value to allow things to continue.
>  
> However on this occasion, the messages above when output again show that the 
> parent's temp limit has updated however the child's limit (the queue) is 
> unchanged.  So it seems the broker's TempUsage does not know about the 
> queue's TempUsage. 
>  
> In looking at the code.. it seems a child Usage class has to call start() in 
> order for this parent -> children link to be established and for parent limit 
> changes to be propagated down to children.  However the Queue start() method 
> doesn't call systemUsage.getTempUsage().start() for some reason (or even just 
> systemUsage.start()).
>  
> Is this a bug?
>  
> DestinationView sadly does not expose setTempLimit() either so this wasn't an 
> option either.
>  
> From Queue:
> {code:java}
> @Override
>     public void start() throws Exception {
>         if (started.compareAndSet(false, true)) {
>             if (memoryUsage != null) {
>                 memoryUsage.start();
>             }
>             if (systemUsage.getStoreUsage() != null) {
>                 systemUsage.getStoreUsage().start();
>             }
>             systemUsage.getMemoryUsage().addUsageListener(this);
>             messages.start();
>             if (getExpireMessagesPeriod() > 0) {
>                 scheduler.executePeriodically(expireMessagesTask, 
> getExpireMessagesPeriod());
>             }
>             doPageIn(false);
>         }
>     }{code}
>    



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)