[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734797#comment-16734797
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce @qihongxu Ok, checked: the latest version of this PR 
+ my branch 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is 
fully non-blocking :)

Nice work!!

> Be great to see a final stat in @qihongxu test env

As soon as I'm back Monday I will try implement same tests on both 
version(this and franz's).


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734656#comment-16734656
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce Done, the PR has been sent, now we can just wait the 
perf results on it :)
I have improved quite a bit the live page cache behaviour/reliability 
(especially if OOME), but sadly I see that the most called method `getMessage` 
cannot be improved anymore without making the lock-free code a real nightmare.
The original version was O(n) depending which message was queried, because 
it needs to walk the entire linked list of paged messages. 
In my version I have amortized the cost by using an interesting hybrid 
between an ArrayList and a LinkedList, similar to 
https://en.wikipedia.org/wiki/Unrolled_linked_list, but (very) optimized for 
addition.
I'm mentioning this, because is a long time I want to design a 
single-threaded version of this same data-structure to be used as the main 
datastructure inside QueueImpl.



> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2220) Fix PageCursorStressTest::testSimpleCursorWithFilter NPE

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734647#comment-16734647
 ] 

ASF GitHub Bot commented on ARTEMIS-2220:
-

GitHub user franz1981 opened a pull request:

https://github.com/apache/activemq-artemis/pull/2489

ARTEMIS-2220 Fix PageCursorStressTest::testSimpleCursorWithFilter NPE

FakeQueue is not correctly setting the queue on its PageSubscription,
leading to fail the test due to NPEs when PageSubscription::getQueue
is being used.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/franz1981/activemq-artemis ARTEMIS-2220

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/activemq-artemis/pull/2489.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2489


commit 32cb5271bb4f61c23c27ed3b7a3cda512e2648fc
Author: Francesco Nigro 
Date:   2019-01-04T22:50:56Z

ARTEMIS-2220 Fix PageCursorStressTest::testSimpleCursorWithFilter NPE

FakeQueue is not correctly setting the queue on its PageSubscription,
leading to fail the test due to NPEs when PageSubscription::getQueue
is being used.




> Fix PageCursorStressTest::testSimpleCursorWithFilter NPE
> 
>
> Key: ARTEMIS-2220
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2220
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Trivial
>
> FakeQueue used in tests is not correctly setting the queue on its 
> PageSubscription, leading to fail the test due to NPEs when the 
> PageSubscription::getQueue is being used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARTEMIS-2220) Fix PageCursorStressTest::testSimpleCursorWithFilter NPE

2019-01-04 Thread Francesco Nigro (JIRA)
Francesco Nigro created ARTEMIS-2220:


 Summary: Fix PageCursorStressTest::testSimpleCursorWithFilter NPE
 Key: ARTEMIS-2220
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2220
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro


FakeQueue used in tests is not correctly setting the queue on its 
PageSubscription, leading to fail the test due to NPEs when the 
PageSubscription::getQueue is being used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734559#comment-16734559
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce Good idea! let me do it now



> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734552#comment-16734552
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@franz1981 did you send a pr to @qihongxu branch so he can merge it and 
this pr picks it up?

Be great to see a final stat in @qihongxu test env


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734507#comment-16734507
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @qihongxu Ok, checked: the latest version of this PR + 
my branch 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache is 
never blocked :)


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734468#comment-16734468
 ] 

ASF GitHub Bot commented on ARTEMIS-2214:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2482
  
I will probably do it on the weekend or on Monday ;)


> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734440#comment-16734440
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce @qihongxu I have re-implemented the LivePageCache to be 
completly lock-free: 
https://github.com/franz1981/activemq-artemis/tree/lock-free-live-page-cache
Fell free to try it into this PR too and I can do a PR to your branch.
I will provide soon some charts of the contention state :+1: 


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734208#comment-16734208
 ] 

ASF GitHub Bot commented on ARTEMIS-2214:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2482
  
@franz1981 im away in another country without my main computer with my 
apache git ssh cert key. Could you merge this?


> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734102#comment-16734102
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> whats the current PR look like in heat maps?

I'm finishing a thing to fix `LivePageCacheImpl` too, hopefully today and I 
will post: impl-wise seems good to me as well :+1: 



> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734098#comment-16734098
 ] 

ASF GitHub Bot commented on ARTEMIS-2214:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2482
  
@qihongxu looks good to me now.


> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734097#comment-16734097
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@qihongxu looks good to me.

@franz1981 whats the current PR look like in heat maps?


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734069#comment-16734069
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> em.Could you please tell us which issues? We need to verify how it 
affects our cluster.

The big issue im relating to, which became a night mare for my 
organisation, was that under high concurrency (high throughput and low latency 
broker setup), the buffers can get mixed up, and was causing index out of 
bounds issues.

Fixes were multiple:

https://github.com/apache/activemq-artemis/commit/024db5bd3c1656d265daf60c9e3a362d53b9088b

https://github.com/apache/activemq-artemis/commit/da7fb89037481fb6343c760010d4553ff28ac87e

Im also aware there have been some other concurrency fixes for smaller 
issues.



> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734030#comment-16734030
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user qihongxu commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
Removed readlock of isPaging().
Also as @franz1981 suggested now only volatile load 
addressFullMessagePolicy once on each call.
Please review and notify me if any more test needed:)


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734023#comment-16734023
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user wy96f commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  

> Id agree, im just cautious as we've been hit a few times with concurrency 
issues that have been a nightmare to find (and quite recently as last month!).
em.Could you please tell us which issues? We need to verify how it 
affects our cluster.




> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733994#comment-16733994
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user wy96f commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2484#discussion_r245252912
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/paging/impl/PagingStoreImpl.java
 ---
@@ -278,21 +293,26 @@ public boolean isPaging() {
   lock.readLock().lock();
 
   try {
- if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
-return false;
- }
- if (addressFullMessagePolicy == AddressFullMessagePolicy.FAIL) {
-return isFull();
- }
- if (addressFullMessagePolicy == AddressFullMessagePolicy.DROP) {
-return isFull();
- }
- return paging;
+ return isPagingDirtyRead();
   } finally {
  lock.readLock().unlock();
   }
}
 
+   @Override
+   public boolean isPagingDirtyRead() {
+  if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
--- End diff --

> Yes, but we can just volatile load once before checking its value 3 
times, on each call of isPagingDirtyRead

get it. nice catch :+1: 


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2213) Clock drift causing server halt

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733990#comment-16733990
 ] 

ASF GitHub Bot commented on ARTEMIS-2213:
-

Github user wy96f closed the pull request at:

https://github.com/apache/activemq-artemis/pull/2481


> Clock drift causing server halt
> ---
>
> Key: ARTEMIS-2213
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2213
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: yangwei
>Priority: Critical
>
> In our production cluster some brokers crashed. There is nothing unusual in 
> the dump stack. After digging into code, we found component was incorrectly 
> expired. When clock drifted back, left time was less than enter time. If the 
> component was not entered in default 12ms, it would be expired and server 
> was halted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733948#comment-16733948
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2484#discussion_r245239811
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/paging/impl/PagingStoreImpl.java
 ---
@@ -278,21 +293,26 @@ public boolean isPaging() {
   lock.readLock().lock();
 
   try {
- if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
-return false;
- }
- if (addressFullMessagePolicy == AddressFullMessagePolicy.FAIL) {
-return isFull();
- }
- if (addressFullMessagePolicy == AddressFullMessagePolicy.DROP) {
-return isFull();
- }
- return paging;
+ return isPagingDirtyRead();
   } finally {
  lock.readLock().unlock();
   }
}
 
+   @Override
+   public boolean isPagingDirtyRead() {
+  if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
--- End diff --

@wy96f what @franz1981 is trying to say, is we can do the volatile read 
just once, by adding one line e.g.


 AddressFullMessagePolicy addressFullMessagePolicy = 
this.addressFullMessagePolicy;
 if (addressFullMessagePolicy == AddressFullMessagePolicy.BLOCK) {
return false;
 }
 if (addressFullMessagePolicy == AddressFullMessagePolicy.FAIL) {
return isFull();
 }
 if (addressFullMessagePolicy == AddressFullMessagePolicy.DROP) {
return isFull();
 }
 return paging;


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733943#comment-16733943
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce
> 
> > Do you get this on master or this PR (im mean is that a typo)?
> 
> I've got that on master!

ok so i think we not need worry on this for realms of this PR. (probably 
needs looking into, but doesn;t need to be solved in this pr - imo) 


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733942#comment-16733942
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> If we return `true` from the dirty read we can just return it, while if 
we found the it `false` we could attempt to enter the read lock and validate 
that's not paging for real.

Ive literally gone through every case, what occurs is we call isPaging 
within an if statement, and then do some logic after, as such anyhow any action 
we do within these if statements anyhow will be based off a stale state. 

Im starting to just think we make isPaging not use a read lock  (aka make 
it dirty), as its only used in queueimpl like mentioned and for queuecontrol 
(aka the admin gui)


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733928#comment-16733928
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
> Im starting to feel like Alice here and were going to end up going into a 
rabbit hole ;) and will end up with the original isPaging just being dirty.

+1 same exact feeling 
Quoting myself twice for @wy96f:
> LivePageCacheImpl (in violet) is now a major contention point

and 

> Compaction is stealing lot of cpu and I/O

Just a thought: we can just choose in what being less stale or not.
If we return `true` from the dirty read we can just return it, while if we 
found the it `false` we could attempt to enter the read lock and validate 
that's not paging for real.




> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733922#comment-16733922
 ] 

ASF GitHub Bot commented on ARTEMIS-2214:
-

Github user michaelandrepearce commented on a diff in the pull request:

https://github.com/apache/activemq-artemis/pull/2482#discussion_r245233815
  
--- Diff: 
artemis-server/src/main/java/org/apache/activemq/artemis/core/paging/cursor/PagedReferenceImpl.java
 ---
@@ -120,14 +126,16 @@ public PagedReferenceImpl(final PagePosition position,
  this.largeMessage = message.getMessage().isLargeMessage() ? 
IS_LARGE_MESSAGE : IS_NOT_LARGE_MESSAGE;
  this.transactionID = message.getTransactionID();
  this.messageID = message.getMessage().getMessageID();
-
+ this.durable = message.getMessage().isDurable() ? IS_DURABLE : 
IS_NOT_DURABLE;
+ this.deliveryTime = 
message.getMessage().getScheduledDeliveryTime();
  //pre-cache the message size so we don't have to reload the 
message later if it is GC'd
  getPersistentSize();
   } else {
  this.largeMessage = UNDEFINED_IS_LARGE_MESSAGE;
  this.transactionID = -2;
  this.messageID = -1;
  this.messageSize = -1;
+ this.durable = UNDEFINED_IS_DURABLE;
--- End diff --

for completeness (its a nit) set deliveryTime to its undefined value here.


> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2190) Core JMS client leaks temporary destination names

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733926#comment-16733926
 ] 

ASF GitHub Bot commented on ARTEMIS-2190:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2477
  
LGTM - would merge for you but currently abroad and don;t have access to my 
computer with my apache git ssh key, feel free to merge.


> Core JMS client leaks temporary destination names
> -
>
> Key: ARTEMIS-2190
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2190
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 1.5.5, 2.6.0
>Reporter: Tom Ross
>Assignee: Justin Bertram
>Priority: Major
>
> The core JMS client leaks the {{SimpleString}} identifiers for temporary 
> destinations. When a temporary JMS destination is created it is added to two 
> lists on {{ActiveMQConnection}}, but when it is deleted it is only removed 
> from one list.
> {code:java}
> public void addTemporaryQueue(final SimpleString queueAddress) {
>  tempQueues.add(queueAddress);
>  knownDestinations.add(queueAddress);
> }
> public void removeTemporaryQueue(final SimpleString queueAddress) {
>  tempQueues.remove(queueAddress);
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2213) Clock drift causing server halt

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733923#comment-16733923
 ] 

ASF GitHub Bot commented on ARTEMIS-2213:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2481
  
@wy96f if you could close, only the person opening can close, or when we 
merge.


> Clock drift causing server halt
> ---
>
> Key: ARTEMIS-2213
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2213
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: yangwei
>Priority: Critical
>
> In our production cluster some brokers crashed. There is nothing unusual in 
> the dump stack. After digging into code, we found component was incorrectly 
> expired. When clock drifted back, left time was less than enter time. If the 
> component was not entered in default 12ms, it would be expired and server 
> was halted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733919#comment-16733919
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user franz1981 commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
@michaelandrepearce 
> Do you get this on master or this PR (im mean is that a typo)?

I've got that on master!


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2019-01-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733914#comment-16733914
 ] 

ASF GitHub Bot commented on ARTEMIS-2216:
-

Github user michaelandrepearce commented on the issue:

https://github.com/apache/activemq-artemis/pull/2484
  
> @michaelandrepearce @qihongxu Just a lil OT but I'm getting these warns 
on master:
> 
> ```
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,136 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,139 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,142 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,145 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,148 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,151 during compact replay
> 2019-01-03 17:36:44,408 WARN  [org.apache.activemq.artemis.journal] 
AMQ142007: Can not find record 8,103,154 during compact replay
> ```

Do you get this on master? If not then that is a BIG worry


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: contention_MASTER_global.svg, contention_PR_global.svg, 
> contention_PR_single.svg
>
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive TPS in total raises 
> from 33k to about 43k. From all above this trade-off seems beneficial and 
> acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2214:
---
Summary: ARTEMIS-2214 Cache durable in PagedReference  (was: 
Cache durable in PagedReference to avoid blocks in consuming paged 
messages)

> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)