from:"Francesco Nigro \(JIRA\)"

[jira] [Created] (ARTEMIS-3061) getting AMQP duplicate property can save many String comparisons and class checks

2021-01-11 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3061:


 Summary: getting AMQP duplicate property can save many String 
comparisons and class checks
 Key: ARTEMIS-3061
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3061
 Project: ActiveMQ Artemis
  Issue Type: Improvement
  Components: AMQP
Affects Versions: 2.16.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3059) AMQP message reencoding should save creating Netty heap arenas

2021-01-11 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3059:
-
Summary: AMQP message reencoding should save creating Netty heap arenas  
(was: AMQP reencoding should save creating Netty heap arenas)

> AMQP message reencoding should save creating Netty heap arenas
> --
>
> Key: ARTEMIS-3059
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3059
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: AMQP, Broker
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> AMQP reencoding is using Netty pooled heap buffers to encode the message, 
> creating heap arenas that would affect the broker heap memory footprint: this 
> could be saved by using off-heap/direct arenas that are already allocated by 
> the broker for networking.
>  
> What can cause messages to be re-encoded is sending them across bridges, that 
> means that cluster connections (that are special bridges) can stealthy affect 
> the broker memory footprint.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas

2021-01-11 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3059:
-
Description: 
AMQP reencoding is using Netty pooled heap buffers to encode the message, 
creating heap arenas that would affect the broker heap memory footprint: this 
could be saved by using off-heap/direct arenas that are already allocated by 
the broker for networking.

 

What can cause messages to be re-encoded is sending them across bridges, that 
means that cluster connections (that are special bridges) can stealthy affect 
the broker memory footprint.

 

  was:AMQP reencoding is using Netty pooled heap buffers to encode the message, 
creating heap arenas that would affect the broker heap memory footprint: this 
could be saved by using off-heap/direct arenas that are already allocated by 
the broker for networking.


> AMQP reencoding should save creating Netty heap arenas
> --
>
> Key: ARTEMIS-3059
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3059
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: AMQP, Broker
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> AMQP reencoding is using Netty pooled heap buffers to encode the message, 
> creating heap arenas that would affect the broker heap memory footprint: this 
> could be saved by using off-heap/direct arenas that are already allocated by 
> the broker for networking.
>  
> What can cause messages to be re-encoded is sending them across bridges, that 
> means that cluster connections (that are special bridges) can stealthy affect 
> the broker memory footprint.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas

2021-01-11 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-3059 started by Francesco Nigro.

> AMQP reencoding should save creating Netty heap arenas
> --
>
> Key: ARTEMIS-3059
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3059
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: AMQP, Broker
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> AMQP reencoding is using Netty pooled heap buffers to encode the message, 
> creating heap arenas that would affect the broker heap memory footprint: this 
> could be saved by using off-heap/direct arenas that are already allocated by 
> the broker for networking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3059) AMQP reencoding should save creating Netty heap arenas

2021-01-11 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3059:


 Summary: AMQP reencoding should save creating Netty heap arenas
 Key: ARTEMIS-3059
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3059
 Project: ActiveMQ Artemis
  Issue Type: Improvement
  Components: AMQP, Broker
Reporter: Francesco Nigro
Assignee: Francesco Nigro


AMQP reencoding is using Netty pooled heap buffers to encode the message, 
creating heap arenas that would affect the broker heap memory footprint: this 
could be saved by using off-heap/direct arenas that are already allocated by 
the broker for networking.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-3021) OOM due to wrong CORE message memory estimation

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-3021 started by Francesco Nigro.

> OOM due to wrong CORE message memory estimation
> ---
>
> Key: ARTEMIS-3021
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3021
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Durable CORE messages can get their internal buffer enlarged by 
> encodeHeadersAndProperties while being persisted on the journal, but the 
> address size memory estimation using the estimated memory of a message is 
> performed before that, making it less precise. 
> This bad timing estimation, together with Netty ByteBuf auto-sizing mechanism 
> can cause the broker to underestimate the message footprint, causing it to go 
> OOM. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: while reloading live pages.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array, allowing a much 
faster lookup.

 

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: while reloading live pages.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to an array lookup.
> [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
> clearly show the issue with the current implementation.
> The ideal approaches to improve it could be:
>  # to replace the chunked list with a copy on write array list
>  # to use cursor/iterator API over the chunk list, binding one to each 
> consumer, in order to get a linear stride over the live paged messages
> Sadly, the latter approach seems not doable because the live page cache is 
> accessed for each message lookup in an anonymous way, making impossible to 
> have a 1:1 binding with the consumers, while the former seems not doable, 
> because of the array copy cost on appending.
>  
> There is still one case that could be improved using the former approach, 
> instead, delivering a huge speedup on lookup cost: while reloading live pages.
> A reloaded live page already knows the amount of the loaded live paged 
> messages, making possible to store them in a simple array, allowing a much 
> faster lookup.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: while reloading of live 
pages.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: reloading of live page.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to an array lookup.
> [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
> clearly show the issue with the current implementation.
> The ideal approaches to improve it could be:
>  # to replace the chunked list with a copy on write array list
>  # to use cursor/iterator API over the chunk list, binding one to each 
> consumer, in order to get a linear stride over the live paged messages
> Sadly, the latter approach seems not doable because the live page cache is 
> accessed for each message lookup in an anonymous way, making impossible to 
> have a 1:1 binding with the consumers, while the former seems not doable, 
> because of the array copy cost on appending.
>  
> There is still one case that could be improved using the former approach, 
> instead, delivering a huge speedup on lookup cost: while reloading of live 
> pages.
> A reloaded live page already knows the amount of the loaded live paged 
> messages, making possible to store them in a simple array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: while reloading live pages.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: while reloading of live 
pages.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to an array lookup.
> [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
> clearly show the issue with the current implementation.
> The ideal approaches to improve it could be:
>  # to replace the chunked list with a copy on write array list
>  # to use cursor/iterator API over the chunk list, binding one to each 
> consumer, in order to get a linear stride over the live paged messages
> Sadly, the latter approach seems not doable because the live page cache is 
> accessed for each message lookup in an anonymous way, making impossible to 
> have a 1:1 binding with the consumers, while the former seems not doable, 
> because of the array copy cost on appending.
>  
> There is still one case that could be improved using the former approach, 
> instead, delivering a huge speedup on lookup cost: while reloading live pages.
> A reloaded live page already knows the amount of the loaded live paged 
> messages, making possible to store them in a simple array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3051:
-
Description: 
MessageReferenceImpl::memoryOffset is used on 
MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.

[https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using 
COOPS and 8 bytes alignment, that's very common, and that's a more accurate 
estimated footprint value.

To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could 
be improved in a bigger follow-up PR using JOL on the test suite to double 
check footprint.

 

The interesting thing is that paging will be positively affected by this 
change, because the broker won't under-estimate the memory footprint of many 
references, triggering paging sooner.

 

  was:
MessageReferenceImpl::memoryOffset is used on 
MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.

[https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using 
COOPS and 8 bytes alignment, that's very common, and that's a more accurate 
estimated footprint value.

To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could 
be improved in a bigger follow-up PR.

 

The interesting thing is that paging will be positively affected by this 
change, because the broker won't under-estimate the memory footprint of many 
references, triggering paging sooner.

 


> Fix MessageReferenceImpl::getMemoryEstimate
> ---
>
> Key: ARTEMIS-3051
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3051
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MessageReferenceImpl::memoryOffset is used on 
> MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.
> [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit 
> using COOPS and 8 bytes alignment, that's very common, and that's a more 
> accurate estimated footprint value.
> To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that 
> could be improved in a bigger follow-up PR using JOL on the test suite to 
> double check footprint.
>  
> The interesting thing is that paging will be positively affected by this 
> change, because the broker won't under-estimate the memory footprint of many 
> references, triggering paging sooner.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3051:
-
Description: 
MessageReferenceImpl::memoryOffset is used on 
MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.

[https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using 
COOPS and 8 bytes alignment, that's very common, and that's a more accurate 
estimated footprint value.

To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could 
be improved in a bigger follow-up PR.

 

The interesting thing is that paging will be positively affected by this 
change, because the broker won't under-estimate the memory footprint of many 
references, triggering paging sooner.

 

  was:
MessageReferenceImpl::memoryOffset is used on 
MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.

[https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using 
COOPS and 8 bytes alignment, that's very common, and that's a more accurate 
estimated footprint value.

To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could 
be improved in a bigger follow-up PR.

 


> Fix MessageReferenceImpl::getMemoryEstimate
> ---
>
> Key: ARTEMIS-3051
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3051
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MessageReferenceImpl::memoryOffset is used on 
> MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.
> [https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit 
> using COOPS and 8 bytes alignment, that's very common, and that's a more 
> accurate estimated footprint value.
> To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that 
> could be improved in a bigger follow-up PR.
>  
> The interesting thing is that paging will be positively affected by this 
> change, because the broker won't under-estimate the memory footprint of many 
> references, triggering paging sooner.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3051) Fix MessageReferenceImpl::getMemoryEstimate

2021-01-05 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3051:


 Summary: Fix MessageReferenceImpl::getMemoryEstimate
 Key: ARTEMIS-3051
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3051
 Project: ActiveMQ Artemis
  Issue Type: Bug
Affects Versions: 2.16.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro


MessageReferenceImpl::memoryOffset is used on 
MessageReferenceImpl::getMemoryEstimate: it reports 64 bytes.

[https://github.com/openjdk/jol] is reporting 72 bytes for OpenJDK 64 bit using 
COOPS and 8 bytes alignment, that's very common, and that's a more accurate 
estimated footprint value.

To be honest, a full-fat 64 bit JVM would use 112 bytes instead, but that could 
be improved in a bigger follow-up PR.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3050) Reduce PagedReferenceImpl memory footprint

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3050:
-
Description: PagedReferenceImpl is never being used as QueueImpl node, 
hence it doesn't make sense for it to extends the node class, saving few bytes 
of memory footprint eg a COOPS 64 bit JVM get 88 bytes vs 104 bytes -> ~18% 
saved memory for each message ref with no semantic impacts.
   Priority: Trivial  (was: Major)

> Reduce PagedReferenceImpl memory footprint
> --
>
> Key: ARTEMIS-3050
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3050
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Trivial
>
> PagedReferenceImpl is never being used as QueueImpl node, hence it doesn't 
> make sense for it to extends the node class, saving few bytes of memory 
> footprint eg a COOPS 64 bit JVM get 88 bytes vs 104 bytes -> ~18% saved 
> memory for each message ref with no semantic impacts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3050) Reduce PagedReferenceImpl memory footprint

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3050:
-
Summary: Reduce PagedReferenceImpl memory footprint  (was: Reduce 
PagedMessage memory footprint)

> Reduce PagedReferenceImpl memory footprint
> --
>
> Key: ARTEMIS-3050
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3050
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3050) Reduce PagedMessage memory footprint

2021-01-05 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3050:


 Summary: Reduce PagedMessage memory footprint
 Key: ARTEMIS-3050
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3050
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Reporter: Francesco Nigro
Assignee: Francesco Nigro






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-3049 started by Francesco Nigro.

> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to an array lookup.
> [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
> clearly show the issue with the current implementation.
> The ideal approaches to improve it could be:
>  # to replace the chunked list with a copy on write array list
>  # to use cursor/iterator API over the chunk list, binding one to each 
> consumer, in order to get a linear stride over the live paged messages
> Sadly, the latter approach seems not doable because the live page cache is 
> accessed for each message lookup in an anonymous way, making impossible to 
> have a 1:1 binding with the consumers, while the former seems not doable, 
> because of the array copy cost on appending.
>  
> There is still one case that could be improved using the former approach, 
> instead, delivering a huge speedup on lookup cost: reloading of live page.
> A reloaded live page already knows the amount of the loaded live paged 
> messages, making possible to store them in a simple array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-05 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to an array lookup.

[https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
clearly show the issue with the current implementation.

The ideal approaches to improve it could be:
 # to replace the chunked list with a copy on write array list
 # to use cursor/iterator API over the chunk list, binding one to each 
consumer, in order to get a linear stride over the live paged messages

Sadly, the latter approach seems not doable because the live page cache is 
accessed for each message lookup in an anonymous way, making impossible to have 
a 1:1 binding with the consumers, while the former seems not doable, because of 
the array copy cost on appending.

 

There is still one case that could be improved using the former approach, 
instead, delivering a huge speedup on lookup cost: reloading of live page.

A reloaded live page already knows the amount of the loaded live paged 
messages, making possible to store them in a simple array.

 

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload

https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
clearly show the issue with the current implementation.


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to an array lookup.
> [https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939] 
> clearly show the issue with the current implementation.
> The ideal approaches to improve it could be:
>  # to replace the chunked list with a copy on write array list
>  # to use cursor/iterator API over the chunk list, binding one to each 
> consumer, in order to get a linear stride over the live paged messages
> Sadly, the latter approach seems not doable because the live page cache is 
> accessed for each message lookup in an anonymous way, making impossible to 
> have a 1:1 binding with the consumers, while the former seems not doable, 
> because of the array copy cost on appending.
>  
> There is still one case that could be improved using the former approach, 
> instead, delivering a huge speedup on lookup cost: reloading of live page.
> A reloaded live page already knows the amount of the loaded live paged 
> messages, making possible to store them in a simple array.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

2021-01-05 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258743#comment-17258743
 ] 

Francesco Nigro commented on ARTEMIS-2852:
--

[~adamw1pl] I've looked at the new post on [https://softwaremill.com/mqperf/]
 And it seems that the new behaviour shown on:

 

!https://softwaremill.com/user/pages/blog/mqperf/artemis.png?g-d425a3da|width=749,height=484!

 

Is different from the one on https://softwaremill.com/mqperf-2017/

!https://softwaremill.com/user/themes/softwaremill/assets/_old-website/uploads/2017/07/mqperf/artemis.png!

https://issues.apache.org/jira/browse/ARTEMIS-2877 should have already fixed 
the scalability issue, but I see that 
https://issues.apache.org/jira/browse/ARTEMIS-3045 could be a reasonable step 
forward to improve the current behavior: I don't still get why 2.2.0 should 
scale better then master itself, but I'll investigate on 
https://issues.apache.org/jira/browse/ARTEMIS-3045 about it.

There is any chance you could check if 
[https://github.com/franz1981/activemq-artemis/tree/batching_replication_manager]
 is improving things? 

I don't know how it works and if there are any chances to get the result on the 
blog post updated at a certain point (before the next round); let me know...

 

> Huge performance decrease between versions 2.2.0 and 2.13.0
> ---
>
> Key: ARTEMIS-2852
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2852
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Kasper Kondzielski
>Assignee: Francesco Nigro
>Priority: Major
> Fix For: 2.16.0
>
> Attachments: Selection_433.png, Selection_434.png, Selection_440.png, 
> Selection_441.png, Selection_451.png
>
>
> Hi,
> Recently, we started to prepare a new revision of our blog-post in which we 
> test various implementations of replicated queues. Previous version can be 
> found here:  [https://softwaremill.com/mqperf/]
> We updated artemis binary to 2.13.0, regenerated configuration file and 
> applied all the performance tricks you told us last time. In particular these 
> were:
>  * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}}
>  * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this 
> one we forgot to set, but we suspect that it is not the issue)}}
>  * {{journal-type}} set to {{MAPPED}}
>  * {{journal-datasync}}, {{journal-sync-non-transactional}} and 
> {{journal-sync-transactional}} all set to false
> Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 
> GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 
> Mbps) and we decided to always run twice as much receivers as senders.
> From our tests it looks like version 2.13.0 is not scaling as well, with the 
> increase of senders and receivers, as version 2.2.0 (previously tested). 
> Basically is not scaling at all as the throughput stays almost at the same 
> level, while previously it used to grow linearly.
> Here you can find our tests results for both versions: 
> [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing]
> We are aware that now there is a dedicated page in documentation about 
> performance tuning, but we are surprised that same settings as before 
> performs much worse.
> Maybe there is an obvious property which we overlooked which should be turned 
> on? 
> All changes between those versions together with the final configuration can 
> be found on this merged PR: 
> [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a]
>  
> Charts showing machines' usage in attachments. Memory consumed by artemis 
> process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. 
> p.s. I wanted to ask this question on mailing list/nabble forum first but it 
> seems that I don't have permissions to do so even though I registered & 
> subscribed. Is that intentional?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-04 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload

https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
clearly show the issue with the current implementation.

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload

https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
explains clearly the issue with the current implementation.


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to a O(1) lookup on ArrayList-like data 
> structure. 
> it's possible to speed it up by:
> # using a last accessed buffer cache on the append only chunked list used on 
> LivePageCacheImpl, to speedup the most recent (& nearest) accesses
> # using an array with the fresh reloaded paged messages, in case of cache 
> reload
> https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
> clearly show the issue with the current implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-04 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload

https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
explains clearly the issue with the current implementation.

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to a O(1) lookup on ArrayList-like data 
> structure. 
> it's possible to speed it up by:
> # using a last accessed buffer cache on the append only chunked list used on 
> LivePageCacheImpl, to speedup the most recent (& nearest) accesses
> # using an array with the fresh reloaded paged messages, in case of cache 
> reload
> https://github.com/apache/activemq-artemis/pull/2494#issuecomment-455086939 
> explains clearly the issue with the current implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-04 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the fresh reloaded paged messages, in case of cache reload

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the any fresh reloaded paged messages, in case of cache 
reload


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to a O(1) lookup on ArrayList-like data 
> structure. 
> it's possible to speed it up by:
> # using a last accessed buffer cache on the append only chunked list used on 
> LivePageCacheImpl, to speedup the most recent (& nearest) accesses
> # using an array with the fresh reloaded paged messages, in case of cache 
> reload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-04 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3049:
-
Description: 
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup the most recent (& nearest) accesses
# using an array with the any fresh reloaded paged messages, in case of cache 
reload

  was:
LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup nearest accesses (very likely to happen with a 
single consumer)
# using an array with the any fresh reloaded paged messages, in case of cache 
reload


> Reduce live page lookup cost
> 
>
> Key: ARTEMIS-3049
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LivePageCacheImpl::getMessage is performing a linked-list-like lookup that 
> can be rather slow if compared to a O(1) lookup on ArrayList-like data 
> structure. 
> it's possible to speed it up by:
> # using a last accessed buffer cache on the append only chunked list used on 
> LivePageCacheImpl, to speedup the most recent (& nearest) accesses
> # using an array with the any fresh reloaded paged messages, in case of cache 
> reload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3049) Reduce live page lookup cost

2021-01-04 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3049:


 Summary: Reduce live page lookup cost
 Key: ARTEMIS-3049
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3049
 Project: ActiveMQ Artemis
  Issue Type: Improvement
  Components: Broker
Affects Versions: 2.16.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro


LivePageCacheImpl::getMessage is performing a linked-list-like lookup that can 
be rather slow if compared to a O(1) lookup on ArrayList-like data structure. 
it's possible to speed it up by:
# using a last accessed buffer cache on the append only chunked list used on 
LivePageCacheImpl, to speedup nearest accesses (very likely to happen with a 
single consumer)
# using an array with the any fresh reloaded paged messages, in case of cache 
reload



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3045) ReplicationManager can batch sent replicated packets

2020-12-24 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3045:


 Summary: ReplicationManager can batch sent replicated packets
 Key: ARTEMIS-3045
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3045
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Reporter: Francesco Nigro
Assignee: Francesco Nigro






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3025) JsonReader char[] leak

2020-12-09 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3025:
-
Description: 
The default Json provider ie https://github.com/apache/johnzon is using several 
pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't 
pool char[] if the JsonReader isn't properly closed.

Currently we're not properly closing such readers and that means that we 
allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled 
notification ~ 20 MiB.

Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC 
the mentioned char[] was (very likely) allocated into the old generation as 
Humongous Allocation, needing a Full GC to release it.

  was:
The default Json provider ie https://github.com/apache/johnzon is using several 
pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't 
pool char[] if the JsonReader isn't properly closed.

Currently we're not properly closing such readers and that means that we 
allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled 
notification ~ 20 MiB.

Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC 
the mentioned char[] was (very likely) performed into the old generation, 
needing a Full GC to release it.


> JsonReader char[] leak
> --
>
> Key: ARTEMIS-3025
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3025
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The default Json provider ie https://github.com/apache/johnzon is using 
> several pools while parsing eg {{org.apache.johnzon.max-string-length}} that 
> wouldn't pool char[] if the JsonReader isn't properly closed.
> Currently we're not properly closing such readers and that means that we 
> allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled 
> notification ~ 20 MiB.
> Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC 
> the mentioned char[] was (very likely) allocated into the old generation as 
> Humongous Allocation, needing a Full GC to release it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-3025) JsonReader char[] leak

2020-12-09 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-3025:
-
Description: 
The default Json provider ie https://github.com/apache/johnzon is using several 
pools while parsing eg {{org.apache.johnzon.max-string-length}} that wouldn't 
pool char[] if the JsonReader isn't properly closed.

Currently we're not properly closing such readers and that means that we 
allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled 
notification ~ 20 MiB.

Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC 
the mentioned char[] was (very likely) performed into the old generation, 
needing a Full GC to release it.

  was:
The default Json provider ie https://github.com/apache/johnzon is using several 
pools while parsing eg {{org.apache.johnzon.max-string-length}} that would leak 
char[] if a JsonReader isn't properly closed.

Currently we're not properly closing such readers and that means leaking up to 
{{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification.


> JsonReader char[] leak
> --
>
> Key: ARTEMIS-3025
> URL: https://issues.apache.org/jira/browse/ARTEMIS-3025
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The default Json provider ie https://github.com/apache/johnzon is using 
> several pools while parsing eg {{org.apache.johnzon.max-string-length}} that 
> wouldn't pool char[] if the JsonReader isn't properly closed.
> Currently we're not properly closing such readers and that means that we 
> allocate {{org.apache.johnzon.max-string-length}} * 2 bytes on each handled 
> notification ~ 20 MiB.
> Until https://bugs.openjdk.java.net/browse/JDK-8027959 ie JDK u40, with G1GC 
> the mentioned char[] was (very likely) performed into the old generation, 
> needing a Full GC to release it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3025) JsonReader char[] leak

2020-12-08 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3025:


 Summary: JsonReader char[] leak
 Key: ARTEMIS-3025
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3025
 Project: ActiveMQ Artemis
  Issue Type: Bug
Reporter: Francesco Nigro
Assignee: Francesco Nigro


The default Json provider ie https://github.com/apache/johnzon is using several 
pools while parsing eg {{org.apache.johnzon.max-string-length}} that would leak 
char[] if a JsonReader isn't properly closed.

Currently we're not properly closing such readers and that means leaking up to 
{{org.apache.johnzon.max-string-length}} * 2 bytes on each handled notification.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3021) OOM due to wrong CORE message memory estimation

2020-12-05 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3021:


 Summary: OOM due to wrong CORE message memory estimation
 Key: ARTEMIS-3021
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3021
 Project: ActiveMQ Artemis
  Issue Type: Bug
Reporter: Francesco Nigro
Assignee: Francesco Nigro


Durable CORE messages can get their internal buffer enlarged by 
encodeHeadersAndProperties while being persisted on the journal, but the 
address size memory estimation using the estimated memory of a message is 
performed before that, making it less precise. 

This bad timing estimation, together with Netty ByteBuf auto-sizing mechanism 
can cause the broker to underestimate the message footprint, causing it to go 
OOM. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-3016) Reduce DuplicateIDCache memory footprint

2020-11-26 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-3016:


 Summary: Reduce DuplicateIDCache memory footprint
 Key: ARTEMIS-3016
 URL: https://issues.apache.org/jira/browse/ARTEMIS-3016
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Reporter: Francesco Nigro
Assignee: Francesco Nigro


DuplicateIDCache uses many Long and Integer boxed instances that makes using 
duplicate caches id too memory hungry. This could be improved by using better 
data structures and pooling mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

2020-11-17 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233450#comment-17233450
 ] 

Francesco Nigro commented on ARTEMIS-2852:
--

[~adamw1pl] I can help to review the results, if needed, but my primary concern 
is...the numbers are reasonably around what was achieved with  2.2.0? If the 
conf were comparable

> Huge performance decrease between versions 2.2.0 and 2.13.0
> ---
>
> Key: ARTEMIS-2852
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2852
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Kasper Kondzielski
>Assignee: Francesco Nigro
>Priority: Major
> Fix For: 2.16.0
>
> Attachments: Selection_433.png, Selection_434.png, Selection_440.png, 
> Selection_441.png, Selection_451.png
>
>
> Hi,
> Recently, we started to prepare a new revision of our blog-post in which we 
> test various implementations of replicated queues. Previous version can be 
> found here:  [https://softwaremill.com/mqperf/]
> We updated artemis binary to 2.13.0, regenerated configuration file and 
> applied all the performance tricks you told us last time. In particular these 
> were:
>  * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}}
>  * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this 
> one we forgot to set, but we suspect that it is not the issue)}}
>  * {{journal-type}} set to {{MAPPED}}
>  * {{journal-datasync}}, {{journal-sync-non-transactional}} and 
> {{journal-sync-transactional}} all set to false
> Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 
> GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 
> Mbps) and we decided to always run twice as much receivers as senders.
> From our tests it looks like version 2.13.0 is not scaling as well, with the 
> increase of senders and receivers, as version 2.2.0 (previously tested). 
> Basically is not scaling at all as the throughput stays almost at the same 
> level, while previously it used to grow linearly.
> Here you can find our tests results for both versions: 
> [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing]
> We are aware that now there is a dedicated page in documentation about 
> performance tuning, but we are surprised that same settings as before 
> performs much worse.
> Maybe there is an obvious property which we overlooked which should be turned 
> on? 
> All changes between those versions together with the final configuration can 
> be found on this merged PR: 
> [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a]
>  
> Charts showing machines' usage in attachments. Memory consumed by artemis 
> process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. 
> p.s. I wanted to ask this question on mailing list/nabble forum first but it 
> seems that I don't have permissions to do so even though I registered & 
> subscribed. Is that intentional?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

2020-11-17 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233415#comment-17233415
 ] 

Francesco Nigro commented on ARTEMIS-2852:
--

[~adamw1pl] any news on the numbers of the new version? :D

> Huge performance decrease between versions 2.2.0 and 2.13.0
> ---
>
> Key: ARTEMIS-2852
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2852
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Kasper Kondzielski
>Assignee: Francesco Nigro
>Priority: Major
> Fix For: 2.16.0
>
> Attachments: Selection_433.png, Selection_434.png, Selection_440.png, 
> Selection_441.png, Selection_451.png
>
>
> Hi,
> Recently, we started to prepare a new revision of our blog-post in which we 
> test various implementations of replicated queues. Previous version can be 
> found here:  [https://softwaremill.com/mqperf/]
> We updated artemis binary to 2.13.0, regenerated configuration file and 
> applied all the performance tricks you told us last time. In particular these 
> were:
>  * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}}
>  * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this 
> one we forgot to set, but we suspect that it is not the issue)}}
>  * {{journal-type}} set to {{MAPPED}}
>  * {{journal-datasync}}, {{journal-sync-non-transactional}} and 
> {{journal-sync-transactional}} all set to false
> Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 
> GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 
> Mbps) and we decided to always run twice as much receivers as senders.
> From our tests it looks like version 2.13.0 is not scaling as well, with the 
> increase of senders and receivers, as version 2.2.0 (previously tested). 
> Basically is not scaling at all as the throughput stays almost at the same 
> level, while previously it used to grow linearly.
> Here you can find our tests results for both versions: 
> [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing]
> We are aware that now there is a dedicated page in documentation about 
> performance tuning, but we are surprised that same settings as before 
> performs much worse.
> Maybe there is an obvious property which we overlooked which should be turned 
> on? 
> All changes between those versions together with the final configuration can 
> be found on this merged PR: 
> [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a]
>  
> Charts showing machines' usage in attachments. Memory consumed by artemis 
> process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. 
> p.s. I wanted to ask this question on mailing list/nabble forum first but it 
> seems that I don't have permissions to do so even though I registered & 
> subscribed. Is that intentional?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2996) Provide JMH Benchmarks for Artemis

2020-11-14 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2996:


 Summary: Provide JMH Benchmarks for Artemis
 Key: ARTEMIS-2996
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2996
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Tests
Reporter: Francesco Nigro
Assignee: Francesco Nigro


In order to reliably measure performance of many Artemis component would be 
welcome to implement some https://github.com/openjdk/jmh benchmarks to be used 
for development purposes ie not part of the release




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2852) Huge performance decrease between versions 2.2.0 and 2.13.0

2020-11-12 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230761#comment-17230761
 ] 

Francesco Nigro commented on ARTEMIS-2852:
--

Thanks [~adamw1pl] for the info as well!!

> Huge performance decrease between versions 2.2.0 and 2.13.0
> ---
>
> Key: ARTEMIS-2852
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2852
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Kasper Kondzielski
>Priority: Major
> Attachments: Selection_433.png, Selection_434.png, Selection_440.png, 
> Selection_441.png, Selection_451.png
>
>
> Hi,
> Recently, we started to prepare a new revision of our blog-post in which we 
> test various implementations of replicated queues. Previous version can be 
> found here:  [https://softwaremill.com/mqperf/]
> We updated artemis binary to 2.13.0, regenerated configuration file and 
> applied all the performance tricks you told us last time. In particular these 
> were:
>  * the {{Xmx}} java parameter bumped to {{16G (now bumped to 48G)}}
>  * in {{broker.xml}}, the {{global-max-size}} setting changed to {{8G (this 
> one we forgot to set, but we suspect that it is not the issue)}}
>  * {{journal-type}} set to {{MAPPED}}
>  * {{journal-datasync}}, {{journal-sync-non-transactional}} and 
> {{journal-sync-transactional}} all set to false
> Apart from that we changed machines' type we use to r5.2xlarge ( 8 cores, 64 
> GIB memory, Network bandwidth Up to 10 Gbps, Storage bandwidth Up to 4,750 
> Mbps) and we decided to always run twice as much receivers as senders.
> From our tests it looks like version 2.13.0 is not scaling as well, with the 
> increase of senders and receivers, as version 2.2.0 (previously tested). 
> Basically is not scaling at all as the throughput stays almost at the same 
> level, while previously it used to grow linearly.
> Here you can find our tests results for both versions: 
> [https://docs.google.com/spreadsheets/d/1kr9fzSNLD8bOhMkP7K_4axBQiKel1aJtpxsBCOy9ugU/edit?usp=sharing]
> We are aware that now there is a dedicated page in documentation about 
> performance tuning, but we are surprised that same settings as before 
> performs much worse.
> Maybe there is an obvious property which we overlooked which should be turned 
> on? 
> All changes between those versions together with the final configuration can 
> be found on this merged PR: 
> [https://github.com/softwaremill/mqperf/commit/6bfae489e11a250dc9e6ef59719782f839e8874a]
>  
> Charts showing machines' usage in attachments. Memory consumed by artemis 
> process didn't exceed ~ 16 GB. Bandwidht and cpu weren't also a bottlenecks. 
> p.s. I wanted to ask this question on mailing list/nabble forum first but it 
> seems that I don't have permissions to do so even though I registered & 
> subscribed. Is that intentional?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-2984) Compressed large messages can leak native resources

2020-11-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-2984 started by Francesco Nigro.

> Compressed large messages can leak native resources
> ---
>
> Key: ARTEMIS-2984
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2984
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compressed large messages use native resources in the form of Inflater and 
> Deflater and should release them in a timely manner (instead of relying on 
> finalization) to save OOM to happen (of direct memory, to be precise).
> This issue has the chance to simplify large message controllers, because much 
> of the existing code on controllers (including compressed one) isn't needed 
> at runtime, but just for testing purposes and a proper fix can move dead code 
> there too, saving leaky behavior to be maintained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources

2020-11-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2984:
-
Description: 
Compressed large messages use native resources in the form of Inflater and 
Deflater and should release them in a timely manner (instead of relying on 
finalization) to save OOM to happen (of direct memory, to be precise).

This issue has the chance to simplify a lot the large message controller, 
because much of the existing code on controllers (including compressed one) 
isn't needed at runtime, but just for testing purposes and a proper fix can 
move dead code there too, saving leaky behavior to be maintained.

> Compressed large messages can leak native resources
> ---
>
> Key: ARTEMIS-2984
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2984
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compressed large messages use native resources in the form of Inflater and 
> Deflater and should release them in a timely manner (instead of relying on 
> finalization) to save OOM to happen (of direct memory, to be precise).
> This issue has the chance to simplify a lot the large message controller, 
> because much of the existing code on controllers (including compressed one) 
> isn't needed at runtime, but just for testing purposes and a proper fix can 
> move dead code there too, saving leaky behavior to be maintained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources

2020-11-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2984:
-
Description: 
Compressed large messages use native resources in the form of Inflater and 
Deflater and should release them in a timely manner (instead of relying on 
finalization) to save OOM to happen (of direct memory, to be precise).

This issue has the chance to simplify large message controllers, because much 
of the existing code on controllers (including compressed one) isn't needed at 
runtime, but just for testing purposes and a proper fix can move dead code 
there too, saving leaky behavior to be maintained.

  was:
Compressed large messages use native resources in the form of Inflater and 
Deflater and should release them in a timely manner (instead of relying on 
finalization) to save OOM to happen (of direct memory, to be precise).

This issue has the chance to simplify a lot the large message controller, 
because much of the existing code on controllers (including compressed one) 
isn't needed at runtime, but just for testing purposes and a proper fix can 
move dead code there too, saving leaky behavior to be maintained.


> Compressed large messages can leak native resources
> ---
>
> Key: ARTEMIS-2984
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2984
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Compressed large messages use native resources in the form of Inflater and 
> Deflater and should release them in a timely manner (instead of relying on 
> finalization) to save OOM to happen (of direct memory, to be precise).
> This issue has the chance to simplify large message controllers, because much 
> of the existing code on controllers (including compressed one) isn't needed 
> at runtime, but just for testing purposes and a proper fix can move dead code 
> there too, saving leaky behavior to be maintained.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2984) Compressed large messages can leak native resources

2020-11-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2984:
-
Summary: Compressed large messages can leak native resources  (was: 
Compressed large messages retain native resources on errors)

> Compressed large messages can leak native resources
> ---
>
> Key: ARTEMIS-2984
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2984
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2984) Compressed large messages retain native resources on errors

2020-11-12 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2984:


 Summary: Compressed large messages retain native resources on 
errors
 Key: ARTEMIS-2984
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2984
 Project: ActiveMQ Artemis
  Issue Type: Bug
Reporter: Francesco Nigro
Assignee: Francesco Nigro






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2975) Not able to know if the artemis server is working properly or not.

2020-11-04 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226546#comment-17226546
 ] 

Francesco Nigro commented on ARTEMIS-2975:
--

Hi [~Karanbvp] help us to understand what's "shared storage outage" and in 
which cases it occurs (eg server start, server running, failback, on failover, 
under load etc etc) possibly with some reproducer.
AFAIK Artemis fully rely on reliable disk sanity and it should go into an I/O 
critical error state before shutting down in case of any i/O error.
The sole exception to this seems related to file lock loss, that are using a 
background thread to validate and recover the lock.

> Not able to know if the artemis server is working properly or not.
> --
>
> Key: ARTEMIS-2975
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2975
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Karan Aggarwal
>Priority: Major
>
> Observed that sometimes, Artemis server is stuck due to shared storage outage.
> It is neither in running state nor in stopped state.
> Is there a way to get the current state? To identify that there is some issue 
> in server and restart it programmatically?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2808) Artemis HA with shared storage strategy does not reconnect with shared storage if reconnection happens at shared storage

2020-11-04 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226495#comment-17226495
 ] 

Francesco Nigro commented on ARTEMIS-2808:
--

Hi [~Karanbvp] sadly the restart allowed feature has demonstrated to be too 
dangerous due to memory/resource leaks and has been replaced with something 
more appropriate for use case we would like to cover (DBMS connectivity loss on 
shared store HA).

> Artemis HA with shared storage strategy does not reconnect with shared 
> storage if reconnection happens at shared storage
> 
>
> Key: ARTEMIS-2808
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2808
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Affects Versions: 2.11.0
> Environment: Windows 10
>Reporter: Karan Aggarwal
>Priority: Blocker
> Attachments: Scenario_1.zip, Scenario_2.zip
>
>
> We verified the behavior of Artemis HA by bringing down the shared storage 
> (VM) while run is in progress and here is the observation: 
> *Scenario:*
>  * When Artemis services are up and running and run is in progress we 
> restarted the machine hosting the shared storage
>  * Shared storage was back up in 5 mins
>  * Both Artemis master and slave did not connect back to the shared storage
>  * We tried stopping the Artemis brokers. The slave stopped, but the master 
> did not stop. We had to kill the process.
>  * We tried to start the Artemis brokers. The master did not start up at all. 
> The slave started successfully.
>  * We restarted the master Artemis server. Server started successfully and 
> acquired back up.
> Shared Storage type: NFS
> Impact: The run is stopped and Artemis servers needs to be started again 
> every time shared storage connection goes down momentarily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-2823) Improve JDBC connection management

2020-10-30 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-2823 started by Francesco Nigro.

> Improve JDBC connection management
> --
>
> Key: ARTEMIS-2823
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2823
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Reporter: Mikko
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> I have a case where the whole clustering reliability and HA must rely on HA 
> capabilities of clustered database, and running on top of application server 
> is not an option.
> The current JDBC store implementation is rather bare bones on the connection 
> management side. JDBC driver is used directly with no management layer. At 
> startup, the broker just opens couple of direct connections to database and 
> expects them to be available forever. This is something that cannot be 
> expected in HA production environment. So, similarly to the discussion linked 
> below, in our case we lose the db connection after one hour, and all the 
> brokers need to be restared to get new connections:
> [http://activemq.2283324.n4.nabble.com/Artemis-does-not-reconnect-to-MySQL-after-connection-timeout-td4751956.html]
>  
> This is something that could be resolved by simply using JDBC4 isValid 
> checks, but proper connection handling and pooling through datasource would 
> be preferrable.
> I have implemented a solution for this by using DBCP2 datasource. Our test 
> cluster has been successfully running this forked version since the release 
> of Artemis 2.13.0. I will prepare of pull request if this is seen to be 
> something that can be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARTEMIS-2926) Scheduled task executions are skipped randomly

2020-10-27 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro reassigned ARTEMIS-2926:


Assignee: Francesco Nigro

> Scheduled task executions are skipped randomly
> --
>
> Key: ARTEMIS-2926
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2926
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.13.0
>Reporter: Apache Dev
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip 
> an execution, logging:
> {code}
> Execution ignored due to too many simultaneous executions, probably a 
> previous delayed execution
> {code}
> The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable.
> Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken 
> inside the runnable execution itself. So, depending on relative execution 
> times, it could happen that the difference is less than the given period 
> (e.g. 1 ms), resulting in a skipped execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
The thread pool stop can block stopping if there is a long task still 
running/blocked in the pool and the default strategy while stopping the broker 
is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX 
can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} 
and before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that both {{BackupManager::stop}} and 
{{BackupManager::activated}} are calling {{BackupConnector::close}} that's 
calling {{closeLocator(backupServerLocator)}} without unblocking 
{{clusterControl.authorize()}}.

A possible fix would be to correctly unblock any blocking call on both cases, 
to clean stop {{BackupManager}} and let the broker thread pool to immediately 
stop. 


  was:
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
The thread pool stop can block stopping if there is a long task still 
running/blocked in the pool and the default strategy while stopping the

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
The thread pool stop can block stopping if there is a long task still 
running/blocked in the pool and the default strategy while stopping the broker 
is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX 
can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} 
and before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that both {{BackupManager::stop}} and 
{{BackupManager::activated}} are calling {{BackupConnector::close}} that's 
calling {{closeLocator(backupServerLocator)}} without unblocking 
{{clusterControl.authorize()}}.

A first fix would be to correctly unblock any blocking call on both cases, to 
clean stop {{BackupManager}} and let the broker thread pool to immediately 
stop. 


  was:
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
The thread pool stop can block stopping if there is a long task still 
running/blocked in the pool and the default strategy while stopping the

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
The thread pool stop can block stopping if there is a long task still 
running/blocked in the pool and the default strategy while stopping the broker 
is to await 10 seconds before forcing a shutdown of the pending tasks ie JMX 
can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} 
and before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that {{BackupManager::stop}} that's calling 
{{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
won't unblock {{clusterControl.authorize()}}.
A first fix would be to correctly unblock any blocking call to clean stop 
{{BackupManager}} and let the broker thread pool to immediately stop. 


  was:
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused the AMQ thread pool stop on server stop 
to be moved after {{callDeActiveCallbacks()}}, while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that {{BackupManager::stop}} that's calling 
{{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
won't unblock {{clusterControl.authorize()}}.
A first fix would be to correctly unblock any blocking call to clean stop 
{{BackupManager}} and let the broker thread pool to immediately stop. 


  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that {{BackupManager::stop}} that's calling 
{{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
won't unblock {{clusterControl.authorize()}}.
A first fix would be to correctly unblock any blocking call to clean stop 
{{BackupManager}} and let the broker thread pool to immediately stop. 


  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending

[jira] [Updated] (ARTEMIS-2958) Timed out waiting pool stop on backup restart

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Summary: Timed out waiting pool stop on backup restart  (was: Timed out 
waiting for pool slow down backup restart on failback)

> Timed out waiting pool stop on backup restart
> -
>
> Key: ARTEMIS-2958
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2958
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, JMX
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on 
> server stop to be moved after {{callDeActiveCallbacks()}} while the changes 
> on ARTEMIS-2838 to 
> {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
> {{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
> happen on server stop.
> it means that on server restart, if the thread pool has a slow stop, JMX 
> won't be available until a new start.
> A slow thread pool stop can happen if there is any long task still 
> running/blocked in the pool: the default strategy while stopping the broker 
> is to await 10 seconds before force a shutdown of the pending tasks ie JMX 
> can be unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} 
> and before a new (pre)activation will register it again.
> This stealthy behaviour has been captured by random failures on 
> {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
>  due to some randomly blocked task in the thread pool.
> The core issue that cause the thread pool to be randomly blocked was present 
> by long time, but the unavailability time window of JMX introduced by the 
> mentioned JIRAs was the change that has triggered the bomb.
> The test check for 5 seconds the availability of backup JMX connection during 
> a backup restart (on failback): given that the default thread pool await time 
> is 10 seconds, a longer thread pool stop will make the test to fail.
> It seems, by a thread dump inspection that the pending task is:
> {code:java}
> jmx-failback2-out:"Thread-1 
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
>  Id=43 TIMED_WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
> jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
> jmx-failback2-out:  -  waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
> jmx-failback2-out:  at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> jmx-failback2-out:  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
> jmx-failback2-out:  -  locked java.lang.Object@607e79a2
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
>  Source)
> jmx-failback2-out:  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> jmx-failback2-out:  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> jmx-failback2-out:  at 
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
> jmx-failback2-out:
> jmx-failback2-out:  Number of locked synchronizers = 1
> jmx-failback2-out:  - 
> java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
> {code}
> And indeed it seems that {{BackupManager::stop}} that's calling 
> {{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
> won't unblock {{clusterControl.authorize()}}.
> A first fix would be to correctly unblock any blocking call to clean stop 
> {{BackupManager}} and let

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that {{BackupManager::stop}} that's calling 
{{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
won't unblock {{clusterControl.authorize()}}.
A first fix would be to correctly unblock any blocking call to clean stop 
{{BackupManager}} and let the broker thread pool to immediately stop. 


  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}

And indeed it seems that {{BackupManager::stop}} that's calling 
{{BackupConnector::close}} just {{closeLocator(backupServerLocator)}} that 
won't unblock {{clusterControl.authorize()}}.
A first fix would be to correctly unblock any blocking call to clean stop 
{{BackupManager}} and let the broker thread pool to stop immediately. 


  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
The test check for 5 seconds the availability of backup JMX connection during a 
backup restart (on failback): given that the default thread pool await time is 
10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was the change that has triggered the bomb.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback): given that the default thread pool await time 
is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy while stopping the broker is 
to await 10 seconds before force a shutdown of the pending tasks ie JMX can be 
unavailable for at least 10 seconds after {{callDeActiveCallbacks()}} and 
before a new (pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was that change that has triggered the bomb.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback): given that the default thread pool await time 
is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was that change that has triggered the bomb.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time,

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread pool.
The core issue that cause the thread pool to be randomly blocked was present by 
long time, but the unavailability time window of JMX introduced by the 
mentioned JIRAs was that change that has triggered the bomb.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback): given that the default thread pool await time 
is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

This stealthy behaviour has been captured by random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to some randomly blocked task in the thread

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if the thread pool has a slow stop, JMX won't 
be available until a new start.
A slow thread pool stop can happen if there is any long task still 
running/blocked in the pool: the default strategy of the broker is to await 10 
seconds before force a shutdown of the pending tasks ie JMX can be unavailable 
for at least 10 seconds after {{callDeActiveCallbacks()}} and before a new 
(pre)activation will register it again.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if any task is pending on the thread pool 
stop, the thread pool wouldn't let a start to activate JMX again for the 
default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2,

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity de-registration to 
happen on server stop.
it means that on server restart, if any task is pending on the thread pool 
stop, the thread pool wouldn't let a start to activate JMX again for the 
default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity registration to 
happen on server stop.
it means that on server restart, if any task is pending on the thread pool 
stop, the thread pool wouldn't let a start to activate JMX again for the 
default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await the AMQ thread pool on server 
stop to be moved after {{callDeActiveCallbacks()}} while the changes on 
ARTEMIS-2838 to 
{{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}} have moved the HawtioSecurity registration to 
happen on server stop.
it means that on server restart, if any task is pending on the thread pool 
stop, the thread pool wouldn't let a start to activate JMX again for the 
default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
pending on the thread pool stop, the thread pool wouldn't let a start to 
activate JMX again for the default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
pending on the thread pool stop, the thread pool wouldn't let a start to 
activate JMX again for the default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It seems, by a thread dump inspection that the pending task is:

{code:java}
jmx-failback2-out:"Thread-1 
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1f97cf0d)"
 Id=43 TIMED_WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at sun.misc.Unsafe.park(Native Method)
jmx-failback2-out:  -  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2e2776c1
jmx-failback2-out:  at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
jmx-failback2-out:  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:504)
jmx-failback2-out:  -  locked java.lang.Object@607e79a2
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:434)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.ClusterControl.authorize(ClusterControl.java:80)
jmx-failback2-out:  at 
org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:271)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$38/327040562.run(Unknown
 Source)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
jmx-failback2-out:  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
jmx-failback2-out:  at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
jmx-failback2-out:
jmx-failback2-out:  Number of locked synchronizers = 1
jmx-failback2-out:  - 
java.util.concurrent.ThreadPoolExecutor$Worker@6e676dc8
{code}



  was:
The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
pending on the thread pool stop, the thread pool wouldn't let a start to 
activate JMX again for the default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It's important to investigate what is causing the global thread pool of the 
backup server on failback to not stop immediately.



> Timed out waiting for pool slow down backup restart on failback
> ---
>
>

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Component/s: JMX
 Broker

> Timed out waiting for pool slow down backup restart on failback
> ---
>
> Key: ARTEMIS-2958
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2958
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, JMX
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Minor
>
> The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server 
> stop to be moved after {{callDeActiveCallbacks()}} and the changes on 
> ARTEMIS-2838 to 
> {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
> {{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
> pending on the thread pool stop, the thread pool wouldn't let a start to 
> activate JMX again for the default 10 seconds required to force a pool 
> shutdown.
> The issue that was causing the thread pool to be blocked randomly awaiting an 
> executing task was present by long time, but the unavailability of JMX 
> introduced by the mentioned JIRAs has caused some random failures on 
> {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
> This test check for 5 seconds the availability of backup JMX connection 
> during a backup restart (on failback)) ie {{Wait.assertTrue(() -> 
> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the 
> default thread pool await time is 10 seconds, a longer thread pool stop will 
> make the test to fail.
> It's important to investigate what is causing the global thread pool of the 
> backup server on failback to not stop immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2958:
-
Description: 
The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
pending on the thread pool stop, the thread pool wouldn't let a start to 
activate JMX again for the default 10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
This test check for 5 seconds the availability of backup JMX connection during 
a backup restart (on failback)) ie {{Wait.assertTrue(() -> testConnection(url2, 
objectNameBuilder2), 5_000, 100);}}: given that the default thread pool await 
time is 10 seconds, a longer thread pool stop will make the test to fail.

It's important to investigate what is causing the global thread pool of the 
backup server on failback to not stop immediately.


  was:
The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that if any task is pending on the thread 
pool, the thread pool would left JMX to be unavailable for at least the default 
10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 
5_000, 100);}} that was awaiting only 5 seconds the JMX connection to be 
available on backup restart: given that the default thread pool await time is 
10 seconds that was the primary cause of the failure.

It's important to investigate what is causing the global thread pool of the 
backup server on failback to not stop immediately.



> Timed out waiting for pool slow down backup restart on failback
> ---
>
> Key: ARTEMIS-2958
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2958
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Minor
>
> The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server 
> stop to be moved after {{callDeActiveCallbacks()}} and the changes on 
> ARTEMIS-2838 to 
> {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
> {{callDeActiveCallbacks()}}: it means that on server restart, if any task is 
> pending on the thread pool stop, the thread pool wouldn't let a start to 
> activate JMX again for the default 10 seconds required to force a pool 
> shutdown.
> The issue that was causing the thread pool to be blocked randomly awaiting an 
> executing task was present by long time, but the unavailability of JMX 
> introduced by the mentioned JIRAs has caused some random failures on 
> {{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}}.
> This test check for 5 seconds the availability of backup JMX connection 
> during a backup restart (on failback)) ie {{Wait.assertTrue(() -> 
> testConnection(url2, objectNameBuilder2), 5_000, 100);}}: given that the 
> default thread pool await time is 10 seconds, a longer thread pool stop will 
> make the test to fail.
> It's important to investigate what is causing the global thread pool of the 
> backup server on failback to not stop immediately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2958) Timed out waiting for pool slow down backup restart on failback

2020-10-22 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2958:


 Summary: Timed out waiting for pool slow down backup restart on 
failback
 Key: ARTEMIS-2958
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2958
 Project: ActiveMQ Artemis
  Issue Type: Bug
Reporter: Francesco Nigro
Assignee: Francesco Nigro


The changes on ARTEMIS-2823 have caused to await AMQ thread pool on server stop 
to be moved after {{callDeActiveCallbacks()}} and the changes on ARTEMIS-2838 
to {{server.getServer().getManagementService().unregisterHawtioSecurity()}} on 
{{callDeActiveCallbacks()}}: it means that if any task is pending on the thread 
pool, the thread pool would left JMX to be unavailable for at least the default 
10 seconds required to force a pool shutdown.

The issue that was causing the thread pool to be blocked randomly awaiting an 
executing task was present by long time, but the unavailability of JMX 
introduced by the mentioned JIRAs has caused some random failures on 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 due to {{Wait.assertTrue(() -> testConnection(url2, objectNameBuilder2), 
5_000, 100);}} that was awaiting only 5 seconds the JMX connection to be 
available on backup restart: given that the default thread pool await time is 
10 seconds that was the primary cause of the failure.

It's important to investigate what is causing the global thread pool of the 
backup server on failback to not stop immediately.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2957:
-
Description: 
ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a {{ActivateCallback::preActivate}} that's (re)starting the 
{{ManagementContext}}.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.

The failing test due to this was 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 with


{code:java}
jmx-failback1-out:2020-10-20 22:48:46,515 WARN  
[org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management 
Context, RBAC not available: java.rmi.server.ExportException: internal error: 
ObjID already in use
jmx-failback1-out:  at 
sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [rt.jar:1.8.0_66]
jmx-failback1-out:  at java.lang.reflect.Method.invoke(Method.java:497) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) 
[artemis-boot.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at

[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2957:
-
Description: 
ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a ActivateCallback::preActivate that's (re)starting the 
{{ManagementContext}}.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.

The failing test due to this was 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 with


{code:java}
jmx-failback1-out:2020-10-20 22:48:46,515 WARN  
[org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management 
Context, RBAC not available: java.rmi.server.ExportException: internal error: 
ObjID already in use
jmx-failback1-out:  at 
sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [rt.jar:1.8.0_66]
jmx-failback1-out:  at java.lang.reflect.Method.invoke(Method.java:497) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) 
[artemis-boot.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at

[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2957:
-
Description: 
ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a ActivateCallback::preActivate that's (re)starting the 
managementContext.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.

The failing test due to this was 
{{org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX}},
 with


{code:java}
jmx-failback1-out:2020-10-20 22:48:46,515 WARN  
[org.apache.activemq.artemis.core.server] AMQ97: Unable to start Management 
Context, RBAC not available: java.rmi.server.ExportException: internal error: 
ObjID already in use
jmx-failback1-out:  at 
sun.rmi.transport.ObjectTable.putTarget(ObjectTable.java:186) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.Transport.exportObject(Transport.java:106) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:260) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:208) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:152) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.rmi.registry.RegistryImpl.(RegistryImpl.java:137) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:203) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.RmiRegistryFactory.init(RmiRegistryFactory.java:48)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementConnector.start(ManagementConnector.java:54)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.management.ManagementContext.start(ManagementContext.java:50)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run$1.preActivate(Run.java:88) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.callPreActiveCallbacks(ActiveMQServerImpl.java:2840)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart1(ActiveMQServerImpl.java:2964)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.SharedStoreLiveActivation.run(SharedStoreLiveActivation.java:66)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:626)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:550)
 [artemis-server-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.integration.FileBroker.start(FileBroker.java:64) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.commands.Run.execute(Run.java:116) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.internalExecute(Artemis.java:153) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:101) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at 
org.apache.activemq.artemis.cli.Artemis.execute(Artemis.java:128) 
[artemis-cli-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) [rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [rt.jar:1.8.0_66]
jmx-failback1-out:  at java.lang.reflect.Method.invoke(Method.java:497) 
[rt.jar:1.8.0_66]
jmx-failback1-out:  at 
org.apache.activemq.artemis.boot.Artemis.execute(Artemis.java:134) 
[artemis-boot.jar:2.16.0-SNAPSHOT]
jmx-failback1-out:  at

[jira] [Updated] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2957:
-
Description: 
ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a ActivateCallback::preActivate that's (re)starting the 
managementContext.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.

The failing test due to this was 
org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX

  was:
ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a ActivateCallback::preActivate that's (re)starting the 
managementContext.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.


> ManagementContext is started twice
> --
>
> Key: ARTEMIS-2957
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2957
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: JMX
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> ManagementContext isn't guarding from being started twice.
> A recent change on 
> [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
> introduced a ActivateCallback::preActivate that's (re)starting the 
> managementContext.
> Just guarding against consecutive starts is fixing this, but probably we 
> should start it just once.
> The failing test due to this was 
> org.apache.activemq.artemis.tests.smoke.jmxfailback.JmxFailbackTest#testFailbackOnJMX



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on ARTEMIS-2957 started by Francesco Nigro.

> ManagementContext is started twice
> --
>
> Key: ARTEMIS-2957
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2957
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: JMX
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> ManagementContext isn't guarding from being started twice.
> A recent change on 
> [ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
> introduced a ActivateCallback::preActivate that's (re)starting the 
> managementContext.
> Just guarding against consecutive starts is fixing this, but probably we 
> should start it just once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2957) ManagementContext is started twice

2020-10-20 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2957:


 Summary: ManagementContext is started twice
 Key: ARTEMIS-2957
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2957
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: JMX
Affects Versions: 2.16.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro


ManagementContext isn't guarding from being started twice.

A recent change on 
[ARTEMIS-2838|https://issues.apache.org/jira/browse/ARTEMIS-2838] has  
introduced a ActivateCallback::preActivate that's (re)starting the 
managementContext.

Just guarding against consecutive starts is fixing this, but probably we should 
start it just once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro resolved ARTEMIS-2955.
--
Fix Version/s: 2.16.0
   Resolution: Fixed

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Fix For: 2.16.0
>
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png, screenshot-2.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404
 ] 

Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:27 AM:
-

[~gtully] has noted that poolPreparedStatements is false by default on 
commons-dbcp2: let me check if this can be improved by setting it as true by 
default (that makes sense to me for a real broker too).


was (Author: nigrofranz):
[~gtully] hos noted that poolPreparedStatements is false by default on 
commons-dbcp2: let me check if this can be improved by setting it as true by 
default (that makes sense to me for a real broker too).

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png, screenshot-2.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217432#comment-17217432
 ] 

Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:35 AM:
-

Thanks to the great suggestion of [~gtully], this is going to fix a "likely" 
performance issue on the broker, not just on the test: going to update the JIRA.

See the resulting flamegraph
 !screenshot-2.png! 

There is no more traces of reflective access nor creation of prepared 
statements and perfs are even better then using EmbeddedDataSource given that 
we don't allocate anymore a new connection each time.

Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not 
impacting anything for you as well: it's very likely I'm going to turn on  
poolPreparedStatements by default and users can choose if they prefer to change 
it to false by manually setting it, in case.

one question about
{quote}The reason I did not enable it on my pull request was that it's not 
something universal that's available on alternative datasources, and couldn't 
quite figure out whether implementation changes were needed to prepare for use 
of other datasources.{quote}

I'm just using 
{code:java}
addDataSourceProperty("poolPreparedStatements", "true");
{code}

if dataSourceProperties.isEmpty() but I see a
{code:java}
addDataSourceProperty("maxTotal", "-1");
{code}

Probably we should check if the data source class used is the default one 
before setting both these default, I don't think "maxTotal" is available to 
other data source solutions as well.





was (Author: nigrofranz):
Thanks to the great suggestion of [~gtully], this is going to fix a "likely" 
performance issue on the broker, not just on the test: going to update the JIRA.

See the resulting flamegraph
 !screenshot-2.png! 

There is no more traces of reflective access nor creation of prepared 
statements and perfs are even better then using EmbeddedDataSource given that 
we don't allocate anymore a new connection each time.

Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not 
impacting anything for you as well: it's very likely I'm going to turn on  
poolPreparedStatements by default and users can choose if they prefer to change 
it to false by manually setting it, in case.



> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png, screenshot-2.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217432#comment-17217432
 ] 

Francesco Nigro commented on ARTEMIS-2955:
--

Thanks to the great suggestion of [~gtully], this is going to fix a "likely" 
performance issue on the broker, not just on the test: going to update the JIRA.

See the resulting flamegraph
 !screenshot-2.png! 

There is no more traces of reflective access nor creation of prepared 
statements and perfs are even better then using EmbeddedDataSource given that 
we don't allocate anymore a new connection each time.

Please [~mikkommku] take a look to my PR (will send it soon) so I know I'm not 
impacting anything for you as well: it's very likely I'm going to turn on  
poolPreparedStatements by default and users can choose if they prefer to change 
it to false by manually setting it, in case.



> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png, screenshot-2.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Attachment: screenshot-2.png

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png, screenshot-2.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404
 ] 

Francesco Nigro edited comment on ARTEMIS-2955 at 10/20/20, 8:05 AM:
-

[~gtully] hos noted that poolPreparedStatements is false by default on 
commons-dbcp2: let me check if this can be improved by setting it as true by 
default (that makes sense to me for a real broker too).


was (Author: nigrofranz):
[~gtully] Node that poolPreparedStatements is false by default on 
commons-dbcp2: let me check if this can be improved by setting it as true by 
default (that makes sense to me for a real broker too).

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217404#comment-17217404
 ] 

Francesco Nigro commented on ARTEMIS-2955:
--

[~gtully] Node that poolPreparedStatements is false by default on 
commons-dbcp2: let me check if this can be improved by setting it as true by 
default (that makes sense to me for a real broker too).

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:

 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have

 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope. Specifically it seems 
related to Derby GenericActivationHolder.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:

 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have

 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
Just for reference, in violet, that's the amount of "heavy" work performed by 
EmbeddedDataSource if we use commons-dbcp2:

 !screenshot-1.png! 
It 's clear that the update row is performing a huge additional amount of 
work...


I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope. Specifically it seems 
> related to Derby GenericActivationHolder.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:

 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have

 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
Just for reference, in violet, that's the amount of "heavy" work performed by 
EmbeddedDataSource if we use commons-dbcp2:

 !screenshot-1.png! 
It 's clear that the update row is performing a huge additional amount of 
work...


I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:

 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have

 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> Just for reference, in violet, that's the amount of "heavy" work performed by 
> EmbeddedDataSource if we use commons-dbcp2:
>  !screenshot-1.png! 
> It 's clear that the update row is performing a huge additional amount of 
> work...
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Attachment: screenshot-1.png

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg, 
> screenshot-1.png
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217377#comment-17217377
 ] 

Francesco Nigro commented on ARTEMIS-2955:
--

[~mikkommku] It's something you've noted in the test env while using 
commons-dbcp2?

> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of [ARTEMIS-2823
>  Improve JDBC connection 
> management|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:

 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have

 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of 
[ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of 
> [ARTEMIS-2823|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2955:
-
Description: 
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
[EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 

  was:
The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 


> commons-dbcp2 performance issue with Derby Embedded DBMS
> 
>
> Key: ARTEMIS-2955
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker, Tests
>Affects Versions: 2.16.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: image-2020-10-20-09-08-45-390.png, 
> image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg
>
>
> The test suite has shown an increase in duration of 30 minutes going from to 
> 2:30 to 3 hours: it seems related to integration paging tests running on 
> Embedded Derby with the changes of [ARTEMIS-2823
>  Improve JDBC connection 
> management|https://issues.apache.org/jira/browse/ARTEMIS-2823].
> After some profiling sessions on 
> org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
>  it seems that using commons-dbcp2 with Embedded Derby isn't working as 
> expected:
>  !image-2020-10-20-09-08-45-390.png! 
> while, if we switch to 
> [EmbeddedDataSource.|https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html]
>  we have
>  !image-2020-10-20-09-10-10-644.png! 
> By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
> it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
> statement each time, while EmbeddedDataSource nope.
> I suggest to disable commons-dbcp2 for Derby and investigate if it could 
> happen in a real broker too.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2955) commons-dbcp2 performance issue with Derby Embedded DBMS

2020-10-20 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2955:


 Summary: commons-dbcp2 performance issue with Derby Embedded DBMS
 Key: ARTEMIS-2955
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2955
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker, Tests
Affects Versions: 2.16.0
Reporter: Francesco Nigro
Assignee: Francesco Nigro
 Attachments: image-2020-10-20-09-08-45-390.png, 
image-2020-10-20-09-10-10-644.png, pooling_off.svg, pooling_on.svg

The test suite has shown an increase in duration of 30 minutes going from to 
2:30 to 3 hours: it seems related to integration paging tests running on 
Embedded Derby with the changes of [ARTEMIS-2823
 Improve JDBC connection 
management|https://issues.apache.org/jira/browse/ARTEMIS-2823].

After some profiling sessions on 
org/apache/activemq/artemis/tests/integration/paging/PagingTest.testQueueRemoveAll
 it seems that using commons-dbcp2 with Embedded Derby isn't working as 
expected:
 !image-2020-10-20-09-08-45-390.png! 

while, if we switch to 
https://db.apache.org/derby/docs/10.13/publishedapi/org/apache/derby/jdbc/EmbeddedDataSource.html
 we have
 !image-2020-10-20-09-10-10-644.png! 

By not using commons-dbcp2 we get roughtly a ~10X improvement in performance: 
it seems that commons-dbcp2 is forcing to setup from the ground the prepared 
statement each time, while EmbeddedDataSource nope.
I suggest to disable commons-dbcp2 for Derby and investigate if it could happen 
in a real broker too.






 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2949) Reduce GC on OperationContext::checkTasks

2020-10-14 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2949:


 Summary: Reduce GC on OperationContext::checkTasks
 Key: ARTEMIS-2949
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2949
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Reporter: Francesco Nigro
Assignee: Francesco Nigro


OperationContext::checkTasks is allocating Iterators that seems not scalar 
replaced and that could be saved by using the Queue API of LinkedList.
Similarly, store only tasks can use a reduced  (in term of footprint) task 
holder to reduce the allocation pressure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (ARTEMIS-2941) Improve JDBC HA connection resiliency

2020-10-13 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro closed ARTEMIS-2941.

Resolution: Won't Fix

> Improve JDBC HA connection resiliency
> -
>
> Key: ARTEMIS-2941
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2941
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is aiming to replace the restart enhancement feature of 
> https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is 
> too dangerous due to the numerous potential leaks that a server in production 
> could hit by allowing it to restart while keeping the Java process around. 
> Currently, JDBC HA uses an expiration time on locks that mark the time by 
> which a server instance is allowed to keep a specific role, dependent by the 
> owned lock (live or backup).
> Right now, the first failed attempt to renew such expiration time force a 
> broker to shutdown immediately, while it could be more "relaxed" and just 
> keep retry until the very end ie when the expiration time is approaching to 
> end.
>  
> The only concern of this feature is related to the relation between the 
> broker wall-clock time and the DBMS one, that's used to set the expiration 
> time and that should be within certain margins.
> For this last part I'm aware that classic ActiveMQ lease locks use some 
> configuration parameter to set the magnitude of the allowed difference (and 
> to compute some base offset too).
>  
> Right now this feature seems more risk-free and appealing then  
> https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the 
> scope of it to what's the very core issue ie a more resilient behaviour on 
> JDBC lost connectivity.
>  
> To understand the implications of such change, consider a shared store HA 
> pair with configured 60 seconds of expiration time:
>  # DBMS goes down
>  # an in-flight persistent operation on the live data store cause the live 
> broker to kill itself immediately, because no reliable storage is connected
>  # backup is unable to renew its backup lease lock
>  # DBMS goes up in time, before the backup lock local expiration time is ended
>  # backup is able to renew its backup lease lock and retrieve the very last 
> state of live (that was live) and, if no script is configured to restart the 
> live, to failover and take its role
>  # backup is now live and able to serve clients
>  
>  
> There are 2 legit questions re potential improvements on this:
>  # why the live cannot keep re-trying I/O (on the journal, paging or large 
> messages) until its local expiration time end? 
>  # why the live isn't just returning back an I/O error to the clients?
>  
> The former is complex: the main problem I see is from the resource 
> utilization point of view; keeping an accumulating backlog of pending 
> requests, blocked awaiting the last one for an arbitrary long time will 
> probably cause the broker memory to blown up, to not mention that clients 
> will timed out too.
> The latter seems more appealing, because will allow clients to fail fast, but 
> it would affect the current semantic we use on the broker storage operations 
> and I need more investigation to understand how to implement it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (ARTEMIS-2941) Improve JDBC HA connection resiliency

2020-10-13 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro reopened ARTEMIS-2941:
--

> Improve JDBC HA connection resiliency
> -
>
> Key: ARTEMIS-2941
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2941
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is aiming to replace the restart enhancement feature of 
> https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is 
> too dangerous due to the numerous potential leaks that a server in production 
> could hit by allowing it to restart while keeping the Java process around. 
> Currently, JDBC HA uses an expiration time on locks that mark the time by 
> which a server instance is allowed to keep a specific role, dependent by the 
> owned lock (live or backup).
> Right now, the first failed attempt to renew such expiration time force a 
> broker to shutdown immediately, while it could be more "relaxed" and just 
> keep retry until the very end ie when the expiration time is approaching to 
> end.
>  
> The only concern of this feature is related to the relation between the 
> broker wall-clock time and the DBMS one, that's used to set the expiration 
> time and that should be within certain margins.
> For this last part I'm aware that classic ActiveMQ lease locks use some 
> configuration parameter to set the magnitude of the allowed difference (and 
> to compute some base offset too).
>  
> Right now this feature seems more risk-free and appealing then  
> https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the 
> scope of it to what's the very core issue ie a more resilient behaviour on 
> JDBC lost connectivity.
>  
> To understand the implications of such change, consider a shared store HA 
> pair with configured 60 seconds of expiration time:
>  # DBMS goes down
>  # an in-flight persistent operation on the live data store cause the live 
> broker to kill itself immediately, because no reliable storage is connected
>  # backup is unable to renew its backup lease lock
>  # DBMS goes up in time, before the backup lock local expiration time is ended
>  # backup is able to renew its backup lease lock and retrieve the very last 
> state of live (that was live) and, if no script is configured to restart the 
> live, to failover and take its role
>  # backup is now live and able to serve clients
>  
>  
> There are 2 legit questions re potential improvements on this:
>  # why the live cannot keep re-trying I/O (on the journal, paging or large 
> messages) until its local expiration time end? 
>  # why the live isn't just returning back an I/O error to the clients?
>  
> The former is complex: the main problem I see is from the resource 
> utilization point of view; keeping an accumulating backlog of pending 
> requests, blocked awaiting the last one for an arbitrary long time will 
> probably cause the broker memory to blown up, to not mention that clients 
> will timed out too.
> The latter seems more appealing, because will allow clients to fail fast, but 
> it would affect the current semantic we use on the broker storage operations 
> and I need more investigation to understand how to implement it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code can be replaced by Java

2020-10-13 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2945:
-
Summary: Artemis native JNI code can be replaced by Java  (was: Artemis 
native JNI code code be replaced by Java)

> Artemis native JNI code can be replaced by Java
> ---
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2926) Scheduled task executions are skipped randomly

2020-10-13 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950
 ] 

Francesco Nigro edited comment on ARTEMIS-2926 at 10/13/20, 9:25 AM:
-

That's a good point similar to what [~robbie] has exposed on 
https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: 
agree, we should not measure time on the executor in order to drop executions 
and, better, probably we shouldn't drop execution according to that metric...
And related to the original issue with JDBC: the idea would be to allow to 
start as soon as possible without dropping a "too early" execution, but 
instead, allowing duplicate ones too. Regardless the bug, probably 
ActiveMQScheduledComponent isn't a good fit for that feature as it is.


was (Author: nigrofranz):
That's a good point similar to what [~robbie] has exposed on 
https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: 
agree, we should not measure time on the executor in order to drop executions 
and, better, probably we shouldn't drop execution according to that metric...

> Scheduled task executions are skipped randomly
> --
>
> Key: ARTEMIS-2926
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2926
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.13.0
>Reporter: Apache Dev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip 
> an execution, logging:
> {code}
> Execution ignored due to too many simultaneous executions, probably a 
> previous delayed execution
> {code}
> The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable.
> Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken 
> inside the runnable execution itself. So, depending on relative execution 
> times, it could happen that the difference is less than the given period 
> (e.g. 1 ms), resulting in a skipped execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (ARTEMIS-2926) Scheduled task executions are skipped randomly

2020-10-13 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950
 ] 

Francesco Nigro edited comment on ARTEMIS-2926 at 10/13/20, 9:14 AM:
-

That's a good point similar to what [~robbie] has exposed on 
https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: 
agree, we should not measure time on the executor in order to drop executions 
and, better, probably we shouldn't drop execution according to that metric...


was (Author: nigrofranz):
That's a good point similar to what [~robbie] has exposed on 
https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: 
agree, we should not measure time on the executor but still, that's the only 
one that guarantees program order aka single threaded-like execution. Let me 
think about it a bit more if there is a better choice here or do you already 
have a solution?:)

> Scheduled task executions are skipped randomly
> --
>
> Key: ARTEMIS-2926
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2926
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.13.0
>Reporter: Apache Dev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip 
> an execution, logging:
> {code}
> Execution ignored due to too many simultaneous executions, probably a 
> previous delayed execution
> {code}
> The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable.
> Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken 
> inside the runnable execution itself. So, depending on relative execution 
> times, it could happen that the difference is less than the given period 
> (e.g. 1 ms), resulting in a skipped execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2926) Scheduled task executions are skipped randomly

2020-10-13 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212950#comment-17212950
 ] 

Francesco Nigro commented on ARTEMIS-2926:
--

That's a good point similar to what [~robbie] has exposed on 
https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383: 
agree, we should not measure time on the executor but still, that's the only 
one that guarantees program order aka single threaded-like execution. Let me 
think about it a bit more if there is a better choice here or do you already 
have a solution?:)

> Scheduled task executions are skipped randomly
> --
>
> Key: ARTEMIS-2926
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2926
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.13.0
>Reporter: Apache Dev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip 
> an execution, logging:
> {code}
> Execution ignored due to too many simultaneous executions, probably a 
> previous delayed execution
> {code}
> The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable.
> Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken 
> inside the runnable execution itself. So, depending on relative execution 
> times, it could happen that the difference is less than the given period 
> (e.g. 1 ms), resulting in a skipped execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2941) Improve JDBC HA connection resiliency

2020-10-13 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2941:
-
Summary: Improve JDBC HA connection resiliency  (was: Improve JDBC 
connection resiliency)

> Improve JDBC HA connection resiliency
> -
>
> Key: ARTEMIS-2941
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2941
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> This is aiming to replace the restart enhancement feature of 
> https://issues.apache.org/jira/browse/ARTEMIS-2918 because this last one is 
> too dangerous due to the numerous potential leaks that a server in production 
> could hit by allowing it to restart while keeping the Java process around. 
> Currently, JDBC HA uses an expiration time on locks that mark the time by 
> which a server instance is allowed to keep a specific role, dependent by the 
> owned lock (live or backup).
> Right now, the first failed attempt to renew such expiration time force a 
> broker to shutdown immediately, while it could be more "relaxed" and just 
> keep retry until the very end ie when the expiration time is approaching to 
> end.
>  
> The only concern of this feature is related to the relation between the 
> broker wall-clock time and the DBMS one, that's used to set the expiration 
> time and that should be within certain margins.
> For this last part I'm aware that classic ActiveMQ lease locks use some 
> configuration parameter to set the magnitude of the allowed difference (and 
> to compute some base offset too).
>  
> Right now this feature seems more risk-free and appealing then  
> https://issues.apache.org/jira/browse/ARTEMIS-2918, given it narrows the 
> scope of it to what's the very core issue ie a more resilient behaviour on 
> JDBC lost connectivity.
>  
> To understand the implications of such change, consider a shared store HA 
> pair with configured 60 seconds of expiration time:
>  # DBMS goes down
>  # an in-flight persistent operation on the live data store cause the live 
> broker to kill itself immediately, because no reliable storage is connected
>  # backup is unable to renew its backup lease lock
>  # DBMS goes up in time, before the backup lock local expiration time is ended
>  # backup is able to renew its backup lease lock and retrieve the very last 
> state of live (that was live) and, if no script is configured to restart the 
> live, to failover and take its role
>  # backup is now live and able to serve clients
>  
>  
> There are 2 legit questions re potential improvements on this:
>  # why the live cannot keep re-trying I/O (on the journal, paging or large 
> messages) until its local expiration time end? 
>  # why the live isn't just returning back an I/O error to the clients?
>  
> The former is complex: the main problem I see is from the resource 
> utilization point of view; keeping an accumulating backlog of pending 
> requests, blocked awaiting the last one for an arbitrary long time will 
> probably cause the broker memory to blown up, to not mention that clients 
> will timed out too.
> The latter seems more appealing, because will allow clients to fail fast, but 
> it would affect the current semantic we use on the broker storage operations 
> and I need more investigation to understand how to implement it.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java

2020-10-12 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212640#comment-17212640
 ] 

Francesco Nigro commented on ARTEMIS-2945:
--

The async fdatasync change isn't mandatory; given the simplicity to change the 
Java code, it could be easily changed into


{code:java}
   if (res >= 0) {
  final int fd = IoCb.aioFildes(pooledIOCB.bytes);
  if (fd != dumbFD) {
 if (useFdatasync) {
if (lastFile != fd) {
   lastFile = fd;
   fdatasync(fd);
}
 }
  } else {
 stop = true;
  }
   }
{code}
Using the fdatasync blocking behaviour to backpressure and batch burst of 
writes.
The downside of this approach would be to prevent read operations to be 
scheduled if the poller thread is kept busy awaiting an in-flight fdatasync.

> Artemis native JNI code code be replaced by Java
> 
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java

2020-10-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2945:
-
Attachment: (was: new_1000_bis.svg)

> Artemis native JNI code code be replaced by Java
> 
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java

2020-10-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2945:
-
Attachment: (was: old_1000_bis.svg)

> Artemis native JNI code code be replaced by Java
> 
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java

2020-10-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2945:
-
Attachment: new_1000_bis.svg
old_1000_bis.svg

> Artemis native JNI code code be replaced by Java
> 
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
> Attachments: new_1000_bis.svg, old_1000_bis.svg
>
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2946) Increase MaxInlineLevel to 15 on JVM configuration

2020-10-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2946:
-
Summary: Increase MaxInlineLevel to 15 on JVM configuration  (was: Increase 
MaxInlineLevel on JVM configuration)

> Increase MaxInlineLevel to 15 on JVM configuration
> --
>
> Key: ARTEMIS-2946
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2946
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are 
> applications that can benefit from having an increased MaxInlineLevel: many 
> Netty-based ones shown a clear improvement in performance, especially ones 
> with long stack traces and/or wide hierarchies.
>  
> See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for 
> more information about the potential improvements on encoding/decoding. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARTEMIS-2946) Increase MaxInlineLevel on JVM configuration

2020-10-12 Thread Francesco Nigro (Jira)



 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco Nigro updated ARTEMIS-2946:
-
Description: 
According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are 
applications that can benefit from having an increased MaxInlineLevel: many 
Netty-based ones shown a clear improvement in performance, especially ones with 
long stack traces and/or wide hierarchies.

 

See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for more 
information about the potential improvements on encoding/decoding. 

  was:
According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are 
applications that can benefit from having an increased MaxInlineLevel: many 
Netty-based ones shown a clear improvement in performance, especially ones with 
long stack traces and/or wide hierarchies.

 


> Increase MaxInlineLevel on JVM configuration
> 
>
> Key: ARTEMIS-2946
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2946
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are 
> applications that can benefit from having an increased MaxInlineLevel: many 
> Netty-based ones shown a clear improvement in performance, especially ones 
> with long stack traces and/or wide hierarchies.
>  
> See [https://github.com/netty/netty/pull/10368#issuecomment-648174201] for 
> more information about the potential improvements on encoding/decoding. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARTEMIS-2946) Increase MaxInlineLevel on JVM configuration

2020-10-12 Thread Francesco Nigro (Jira)

Francesco Nigro created ARTEMIS-2946:


 Summary: Increase MaxInlineLevel on JVM configuration
 Key: ARTEMIS-2946
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2946
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Reporter: Francesco Nigro
Assignee: Francesco Nigro


According to [https://bugs.openjdk.java.net/browse/JDK-8234863] there are 
applications that can benefit from having an increased MaxInlineLevel: many 
Netty-based ones shown a clear improvement in performance, especially ones with 
long stack traces and/or wide hierarchies.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARTEMIS-2945) Artemis native JNI code code be replaced by Java

2020-10-12 Thread Francesco Nigro (Jira)



[ 
https://issues.apache.org/jira/browse/ARTEMIS-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17212551#comment-17212551
 ] 

Francesco Nigro commented on ARTEMIS-2945:
--

I've avoided to change API in order to keep the Artemis broker code happy, but 
there are lot of improvements that could be made if we allow it instead eg 
while pooling ByteBuffers to perform read/write. 

This would save referencing objects belonging to old gen or different heap 
regions (for region based GCs like Shenandoah/G1/ZGC),

> Artemis native JNI code code be replaced by Java
> 
>
> Key: ARTEMIS-2945
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2945
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>  Components: Broker
>Affects Versions: 2.15.0
>Reporter: Francesco Nigro
>Assignee: Francesco Nigro
>Priority: Major
>
> LibAIO JNI code could be rewritten in Java, while keeping the LibaioContext 
> and LibaioFile APIs the same. 
>  
> There are few benefits from this:
>  # simplification of C code logic to ease maintain it
>  # quicker development process (implement-try-test-debug  cycle) for non-C 
> programmers, including simpler integration with Java test suites
>  # easier monitoring/telemetry integration
>  
> As demonstrations/proofs of such benefits I would introduce several changes 
> into the Java version:
>  # using the libaio async fdatasync feature to allow the LibaioContext duty 
> cycle loop to free CPU resources in order to handle compaction reads without 
> being slowed down by an in-progress fdatasync
>  # use a lock-free high performance data structure to reuse iocbs instead of 
> a locked (using a mutex) one
>  # expose in-flight callbacks to allow future PRs to introduce Java-only 
> latency telemetry per-request or just error check/kill of "slow" in-flight 
> requests
>  
> The possible drawbacks are:
>  # slower performance due to GC barriers cost (see the notes on the PR code)
>  # slower performance in case of JVM without Unsafe (that means very few)
> The latter issue could be addressed by using the new/proper VarHandle 
> features when the Artemis min supported version will move from Java 8 or 
> using the same approach on other projects relying on it eg Netty.
> A note about how to correctly benchmark this due to how I've implemented 
> async fdatasync: in order to save both LibaioContext and TimedBuffer to 
> perform fdatasync batching of writes, I've preferred to simplify the 
> LibaioContext: it means that the buffer timeout to be used on broker.xml 
> should be obtained by ./artemis perf-journal --sync-writes ie batching writes 
> at the speed of the measured fdatasync RTT latency.
> This last behaviour could be changed used a more Apple-to-Apple approach 
> although I still think that the beauty of using is Java is exactly to bring 
> new features/logics in with a shorter development cycle :) 
> We're not in hurry to get this done so that perf-wise this feature could be 
> implemented improving performance over the original version too, if possible.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1192 matches

Mail list logo